#> Warning: package 'dygraphs' was built under R version 4.3.2
Economic impacts: Interest rates and their effect on consumer spending and saving patterns
\(\color{darkblue}{\text{1. Executive Summary }}\)
This study delves into the impact of interest rates on US consumer behavior, examining savings, spending patterns, sentiment alterations, and generational variations. It uncovers correlations between interest rates and consumer actions, unveiling trends in savings, spending habits, and evolving sentiments.
Despite insights gained, limitations in dataset scope emphasize the need for broader, longitudinal studies to grasp comprehensive consumer behavior dynamics.
\(\color{darkblue}{\text{2. Introduction}}\)
\(\color{darkblue}{\text{2.1. Background and}}\) \(\color{darkblue}{\text{motivation}}\)
Interest rates are a key economic variable that affect a wide range of economic activity, including consumer spending and saving habits. Theoretically, when interest rates are low, it is cheaper for consumers to borrow money, which can encourage them to spend more on goods and services. Conversely, when interest rates are high, it is more expensive to borrow money, which can discourage spending and encourage saving. However, recent rate hikes presented more robust economy and consumer behaviour than anticipated.
There are a number of reasons why it is important to understand how interest rates affect people’s spending and saving habits. First, consumer spending is a major driver of economic growth. When consumers spend more money, businesses produce more goods and services, which leads to job creation and higher incomes. Second, saving is important for individuals and households to achieve their financial goals, such as buying a home, retiring comfortably, or educating their children. In the past decade, interest rates have been at historically low levels due to low inflation. This low level of interest rate created a free money environment and helped to stimulate economic growth. However, recent events such as the Russia-Ukraine war or the Covid-19 pandemic have caused inflation to rise. In response, central banks have initiated interest rate hikes, marking the end of the era characterised by exceptionally low interest rates. Given the importance of interest rates for consumer spending and saving habits, it is important to understand how changes in interest rates can affect the economy. Although there are some solid theoretical arguments, consumer habits can change over generations. For example, while expectations on consumer behaviours were different, we have faced more robust economic activity.
Understanding the influence of interest rates on consumer spending and saving behaviours is pivotal for individuals, households, and businesses seeking to make informed financial decisions and prepare for the future. Consequently, this report will conduct a comprehensive examination of the interplay between interest rates and consumer spending and saving patterns while also investigating potential generational shifts in consumer behaviour.
In this study, we will focus our analysis on the United States. While interest rates and their impact on consumer behaviour are indeed relevant on a global scale, we have chosen to concentrate our research on the U.S. market. This approach allows us to delve deeply into the dynamics of interest rates, consumer spending, and saving habits within a specific economic context. It is important to note that interest rates and their effects can vary significantly from one country to another due to differences in economic policies, financial systems, and cultural factors. By focusing exclusively on the United States, we aim to provide a more precise and in-depth analysis of the relationship between interest rates and consumer behaviour within this specific geographical space.
\(\color{darkblue}{\text{2.2. Project objectives}}\)
· To examine the relationship between interest rates and consumer spending and saving habits in the USA.
· To identify the key factors that influence how interest rates affect consumer spending and saving habits in the USA.
· To assess the implications of recent changes in interest rates in the USA for consumer spending and saving habits.
· To investigate the difference in behaviours between generations in the USA.
\(\color{darkblue}{\text{2.3. Research questions}}\)
· How do interest rates affect consumers’ saving and buying home habits?
· How do interest rates affect consumers’ spending habits?
· Can changes in consumer sentiment, influenced by interest rates, be observed through surveys?
· Is there a difference in these behaviours of consumers over years or between generations?
\(\color{darkblue}{\text{2.4. Reasoning for}}\) \(\color{darkblue}{\text{datasets segmentation}}\)
The dataset has been partitioned into 5 distinct periods aligning with significant economic events, policy changes, or distinct market conditions. This segmentation allows for a focused analysis of how varying economic environments within these timeframes potentially impact the variables under investigation.
1- 01/06/1972 - 01/09/1974
2- 01/04/1977 - 01/01/1981
3- 01/10/1993 - 01/05/1995
4- 01/09/2003 - 01/01/2007
5- 01/10/2021 - 01/08-2023
\(\color{darkblue}{\text{2.5. Working directory}}\)
Linear regression modelling:
Using a range of economic indicators and variables, multiple linear regression models will be constructed to examine the correlations within specific areas of interest.
Analysis focus:
1- Investigation into economic factors.
2- Study of consumption patterns.
3- Exploration of psychological well-being.
Comparative analysis:
An examination of the model outcomes will be performed to identify which factors have a significant influence across the studied domains.
The data will be extracted from the sources below:
· US Bureau of Labor Statistics
· Federal Reserve Economic Data
· GSS Data Explorer
\(\color{darkblue}{\text{2.7. Benefits and focus}}\) \(\color{darkblue}{\text{of the research}}\)
While acknowledging the extensive literature on interest rates and their economic impacts, our research dives into nuanced areas and aims to provide insights into consumer behaviour, particularly in relation to mental health, spending habits, and sentiment shifts influenced by interest rate fluctuations. By narrowing down the focus, our project aims to contribute specific insights within these domains. This approach leaves room for a more nuanced understanding of how interest rates intricately influence consumer behaviour beyond the sort of “traditional” economic indicators. Through our investigation, we wish to uncover potential socioeconomic implications associated with interest rate fluctuations. This includes understanding how changes in interest rates may impact not only economic decisions but also mental well-being and broader consumer sentiments, putting the emphasis on the close link between finance and individuals’ holistic well-being. These insights can aid policymakers, financial institutions, and individuals in making informed decisions, potentially informing policies or interventions aimed at improving financial health and well-being amidst economic fluctuations.
Our project aims to contribute by providing nuanced perspectives and empirical evidence within specific behavioural and mental health domains. This contribution can augment the existing body of knowledge on the multifaceted impacts of interest rates on consumer behaviour and well-being.
\(\color{darkblue}{\text{3. Data}}\)
\(\color{darkblue}{\text{3.1. FED Interest Rate}}\)
The Federal Reserve, often referred to as the Fed interest rate, especially the federal funds rate is the primary tool used by the US central bank to influence the nation’s economic conditions. It represents the interest rate at which depository institutions lend a reserve balance to other depository institutions overnight maintaining the stability and liquidity of the objectives which typically include fostering economic growth, maintaining price stability and maximizing employment. This figure provides a historical overview of the FED’s monetary policy as reflected in the monthly interest rates from January 1970 through September 2023. The base data is contain Date as a character. Therefore this column is converted to Date variable by using as.Date() function.
Source : https://www.atlantafed.org/research/inflationproject/stickyprice?tagid=e03d865d-f7d9-4a86-bc50-8a931d2e457b
Code
Interest_rate <- read.csv(here::here("data/Federal Funds Effective Rate.csv"))
#reactable(Interest_rate, sortable = TRUE, searchable = TRUE)
Interest_rate$Date <- as.Date(Interest_rate$Date, format="%d/%m/%Y")
dygraph(Interest_rate, main = "Fed Interest Rate") |>
dyRangeSelector(dateWindow = c("1970-01-01", "2023-09-01"))\(\color{darkblue}{\text{3.2. CPI Stiky Price Rate}}\)
The Sticky Price Consumer Price Index (CPI) focuses on specific goods and services that exhibit limited price adjustments within the CPI basket. In the provided table, it highlights the changes in prices on a monthly and three-monthly basis for these select items, showcasing their stable pricing behavior over time.
Source: https://www.atlantafed.org/research/inflationproject/stickyprice?tagid=e03d865d-f7d9-4a86-bc50-8a931d2e457b
Code
Cpi_rate <- read.csv(here::here("data/stickyprice.csv"))
#reactable(Cpi_rate, sortable = TRUE, searchable = TRUE)
Cpi_rate$Date <- as.Date(Cpi_rate$Date, format="%d/%m/%Y")
dygraph(Cpi_rate[,1:2], main = "CPI Rate") |>
dyRangeSelector(dateWindow = c("1970-01-01", "2023-09-01"))\(\color{darkblue}{\text{3.3. Unemployment Rate}}\)
The Unemployment Rate of States in the United States refers to the percentage of people who are actively seeking employment but currently unemployed in specific states within the country. It is a vital labor market indicator that reflects the economic health of individual U.S. states. A higher unemployment rate suggests that a larger proportion of the state’s workforce is jobless, while a lower rate indicates a healthier job market.
Source: https://fred.stlouisfed.org/series/UNRATE
Code
Unemployment_rate <- read.csv(here::here("data/UNRATE.csv"))
Unemployment_rate$Date <- as.Date(Unemployment_rate$Date, format="%d/%m/%Y")
dygraph(Unemployment_rate, main = "Unemployment Rate") |>
dyRangeSelector(dateWindow = c("1970-01-01", "2023-09-01"))\(\color{darkblue}{\text{3.4. US National Home}}\) \(\color{darkblue}{\text{Price Index}}\)
The US national home price index is a key metric that monitors changes in the prices of residential properties throughout the United States. It offers a comprehensive view of the real estate market’s performance making it a valuable resource for homeowners, real estate professionals and policymakers.
Source : https://fred.stlouisfed.org/series/CSUSHPISA
Code
Home_price_index <- read.csv(here::here("data/U.S. National Home Price Index.csv"))
Home_price_index$Date <- as.Date(Home_price_index$Date, format="%d/%m/%Y")
dygraph(Home_price_index, main = "US National Home Price Index") |>
dyRangeSelector(dateWindow = c("1985-01-01", "2023-09-01"))\(\color{darkblue}{\text{3.5. Home Ownership}}\)
The home ownership prayed often simply referred to as a home ownership rate, is a statistical measure that indicates the percentage of households in the US that own their homes typically through purchases or mortgage as opposed to renting. This rate is a crucial indicator of housing and economic trends providing insight into the prevalence of home ownership.
Source : https://fred.stlouisfed.org/series/RSAHORUSQ156S
Code
Home_ownership <- read.csv(here::here("data/Homeownership Rate in the United States.csv"))
Home_ownership$Date <- as.Date(Home_ownership$Date, format="%d/%m/%Y")
reactable(Home_ownership,
sortable = TRUE,
searchable = TRUE)As it can be seen in the table, not all data is available for all months. However, for modelling purposes, monthly data is required. In order to have monthly data, firstly, time sequence that starts from earliest date and expands to latest date is created.Then, left_join() function is used to have corresponding data from the Home Price Index data set. As a result of this transaction, a data frame that contains all dates was created.
Code
full_dates <- seq(min(Home_ownership$Date), max(Home_ownership$Date), by = "1 month")
# Create a data frame with the full sequence of dates
full_df <- data.frame(Date = full_dates)
library(dplyr)
# Merge the full data frame with your existing data
merged__house_price <- full_df |>
left_join(Home_ownership, by = "Date")
reactable(merged__house_price,
sortable = TRUE,
searchable = TRUE)However, for many dates, the data set still displays “NA” values. In order to eliminate this problem, fill() function is used. The result of this process is presented in below table.
Code
Home_ownership_complete <- merged__house_price |> fill(Home_Ownership_Rate)
reactable(Home_ownership_complete,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{3.6. 30Y Mortgage Rate}}\)
The mortgage rate average in the USA represents the typical interest rate that borrowers can expect to pay when obtaining a mortgage loan for home purchase or a refinancing; it serves as a benchmark for the cost of borrowing in the housing market.
Source: https://fred.stlouisfed.org/series/MORTGAGE30US Mortgage Rate Average in USA.
Code
Mortgage_rate <- read.csv(here::here("data/MORTGAGE30US.csv"))
Mortgage_rate$Date <- as.Date(Mortgage_rate$Date, format="%d/%m/%Y")
reactable(Mortgage_rate,
sortable = TRUE,
searchable = TRUE)
As it can be seen in the above table, there is more than one data for each month. Due to fact that only one data is required for each month, the average value is taken for each month by using grouping function and summarize function. The result can be seen in below table.
Code
df_mortgage <- Mortgage_rate
df_mortgage$Date <- format(df_mortgage$Date, "%Y-%m")
Mortgage_rate_complete <- df_mortgage |>
group_by(Date) |>
summarize(Average_Value = mean(Mortgage_30Y, na.rm = TRUE))
Mortgage_rate_complete$Date <- as.Date(paste(Mortgage_rate_complete$Date, "01", sep = "-"))
Mortgage_rate_complete$Date <- as.Date(Mortgage_rate_complete$Date, format = "%Y-%m-%d")
reactable(Mortgage_rate_complete,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{3.7. Personal Saving rate}}\)
The Personal Saving Rates is a financial indicator that measures the portion of a person’s or households’ disposable income (income after taxes) that is saved or not spent on consumption; it is typically expressed as a percentage that indicates how much individuals or householders are saving compared to their total income.
Source : https://fred.stlouisfed.org/series/PSAVERT
Code
Personal_saving_rate <- read.csv(here::here("data/Personal Saving Rate.csv"))
Personal_saving_rate$Date <- as.Date(Personal_saving_rate$Date, format="%d/%m/%Y")
dygraph(Personal_saving_rate, main = "Personal Saving Rate") |>
dyRangeSelector(dateWindow = c("1970-01-01", "2023-09-01"))\(\color{darkblue}{\text{3.8. Personal Consumption}}\)
The personal consumption typically refers to the total expenditure on goods and services by individuals or households within a specific region or country. It’s a fundamental component of a nation’s Gross Domestic Product (GDP) and provides insights into consumer behavior and economic activity. This metric encompasses various consumer expenditures, such as spending on food, housing, transportation, and other goods and services. This table summarizes a measure of spending on goods and services purchased by, and on behalf of, households based on households’ state of residence in the fifty states and the District of Columbia.
Source : https://fred.stlouisfed.org/graph/?m=QzLB#
Code
Consumption_expenditures <- read.csv(here::here("data/PCE.csv"))
Consumption_expenditures$Date <- as.Date(Consumption_expenditures$Date, format="%d/%m/%Y")
dygraph(Consumption_expenditures, main = "Consumption Expenditures") |>
dyRangeSelector(dateWindow = c("1970-01-01", "2023-09-01"))\(\color{darkblue}{\text{3.9. Unemployment Rate}}\) \(\color{darkblue}{\text{of States}}\)
The importance of Unemployment Rate is discussed before. In order to have a better understanding of the data, the Unemployment Rate by States table was created. This tables describes the monthly Unemployment Rate Expenditures in each state.
Source: https://fred.stlouisfed.org/series/CAUR#
Code
Unemploy <- read.csv(here::here("data/1976-01-01 to 2023-08-01 Unemployment Rate by State (Percent).csv"))
reactable(Unemploy,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{3.10. Personal}}\) \(\color{darkblue}{\text{Consumption Expenditures}}\) \(\color{darkblue}{\text{by State}}\)
The importance of Personal Consumption Expenditures is discussed before. In order to have a better understanding of the data, the Personal Consumption Expenditures by States table was created. This tables describes the monthly Personal Consumption Expenditures in each state.
Source: https://fred.stlouisfed.org/graph/?m=QzLB#
Code
Personal_consumption_by_state <- read.csv(here::here("data/1997-01-01 to 2021-01-01 Personal Consumption Expenditures.csv"))
reactable(Personal_consumption_by_state,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{3.11. House Price Index}}\) \(\color{darkblue}{\text{by State}}\)
The importance of House Price Index is discussed before. In order to have a better understanding of the data, the House Price Index table was created. This tables describes the monthly House Price Index in each state.
Source: https://fred.stlouisfed.org/series/CASTHPI#
Code
House_price_index_state <- read.csv(here::here("data/1975-01-01 to 2023-04-01 All-Transactions House Price Index by State.csv"))
reactable(House_price_index_state,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{3.12. Job Security}}\)
Job security is presented as a table, showcasing the results of a survey. The table offers a detailed breakdown of survey responses. This representation allows for a comprehensive understanding of job security based on participant feedback. The table displays granular data, including responses over the years in terms of qualitative data related to job security sentiments.
Source: https://gssdataexplorer.norc.org
Code
job_security_survey <- read.csv(here::here("data/the job security is good.csv"))
str_job_security_survey <- capture.output(unique(job_security_survey))
kable(head((job_security_survey),20))| year | id_ | ballot |
|---|---|---|
| 1972 | 1 | .i: Inapplicable |
| 1973 | 1 | .i: Inapplicable |
| 1974 | 1 | .i: Inapplicable |
| 1975 | 1 | .i: Inapplicable |
| 1976 | 1 | .i: Inapplicable |
| 1977 | 1 | .i: Inapplicable |
| 1978 | 1 | .i: Inapplicable |
| 1980 | 1 | .i: Inapplicable |
| 1982 | 1 | .i: Inapplicable |
| 1983 | 1 | .i: Inapplicable |
| 1984 | 1 | .i: Inapplicable |
| 1985 | 1 | .i: Inapplicable |
| 1986 | 1 | .i: Inapplicable |
| 1987 | 1 | .i: Inapplicable |
| 1988 | 1 | Ballot b |
| 1989 | 1 | Ballot b |
| 1990 | 1 | Ballot b |
| 1991 | 1 | Ballot a |
| 1993 | 1 | Ballot c |
| 1994 | 1 | Ballot b |
The “Ballot” column signifies the response provided during interviews, while the “Year” corresponds to the year of the General Social Survey (GSS) for each respondent, identified by their unique ID number. Spanning from 1972 to 2022, this dataset encompasses a total of 72,390 responses. Consistent participation in the survey is evident for respondent ID number 1, with a continual presence across most years. However, there are noticeable gaps, such as the absence of data for the year 1992. Furthermore, while some respondents exhibit consistent attendance, a significant portion of IDs only appear for specific years. The “Ballot” category indicates respondents’ answers, with further explanations provided in the table below for clarity.
Code
job_security_is_good_variables_names <- read.csv(here::here("data/job_security_is_good_variables_names.csv"))
kable(job_security_is_good_variables_names)| Code | ballot | Meaning |
|---|---|---|
| 1 | Ballot a | Very true |
| 2 | Ballot b | Somewhat true |
| 3 | Ballot c | Not too true |
| 4 | Ballot d | Not at all true |
| -100 | .i: Inapplicable | NA |
| -99 | .n: No answer | NA |
| -98 | .d: Do not Know/Cannot Choose | NA |
| -97 | .s: Skipped on Web | NA |
| -96 | .z: Variable-specific reserve code | NA |
| -95 | .u: Uncodable | NA |
| -90 | .r: Refused | NA |
| -80 | .x: Not available in this release | NA |
| -70 | .y: Not available in this year | NA |
| -60 | .j: I do not have a job | NA |
| -50 | .p: Not applicable (I have not faced this decision)/Not imputable | NA |
| -40 | .m: DK, NA, IAP | NA |
As indicated before, the job survey data is not complete, and it has a lot of missing values. In order to have a more complete dataset, the complete function was used, and the result was presented in the below table.
Code
job_security_survey_complete <- job_security_survey|> complete(year,id_)
kable(head((job_security_survey_complete),20))| year | id_ | ballot |
|---|---|---|
| 1972 | 1 | .i: Inapplicable |
| 1972 | 2 | .i: Inapplicable |
| 1972 | 3 | .i: Inapplicable |
| 1972 | 4 | .i: Inapplicable |
| 1972 | 5 | .i: Inapplicable |
| 1972 | 6 | .i: Inapplicable |
| 1972 | 7 | .i: Inapplicable |
| 1972 | 8 | .i: Inapplicable |
| 1972 | 9 | .i: Inapplicable |
| 1972 | 10 | .i: Inapplicable |
| 1972 | 11 | .i: Inapplicable |
| 1972 | 12 | .i: Inapplicable |
| 1972 | 13 | .i: Inapplicable |
| 1972 | 14 | .i: Inapplicable |
| 1972 | 15 | .i: Inapplicable |
| 1972 | 16 | .i: Inapplicable |
| 1972 | 17 | .i: Inapplicable |
| 1972 | 18 | .i: Inapplicable |
| 1972 | 19 | .i: Inapplicable |
| 1972 | 20 | .i: Inapplicable |
After completion of the dataset, the size of the dataset increased from 72390 to 153340.
Final distribution can be seen in Figure 1. In order to understand the ballot means and assign numerical codes for the linear regression model, new columns should be created based on the explanation of ballot means table. For this purpose, left_join() function was used, and the results were presented in the below table.
Code
job_security_survey_complete_with_varnames <- left_join(job_security_survey_complete, job_security_is_good_variables_names, join_by(ballot))
job_security_survey_complete_clean_for_table <- subset(job_security_survey_complete_with_varnames, !is.na(job_security_survey_complete_with_varnames$Meaning)) # first finding the rows without NA in column "Meaning"
reactable(job_security_survey_complete_clean_for_table,
sortable = TRUE,
searchable = TRUE)The previous table was creating by taking ballot variable as a reference. Therefore, Meaning variable became “NA” for ballot values “Inapplicable”. Thus, it can be easily seen how many “NA” values that dataset has by considering the Meaning variable only. The Figure 1 was created to see how many NA values this dataset contains.
Code
ggplot(job_security_survey_complete_with_varnames, aes(x = Meaning, fill = Meaning )) + scale_x_discrete(label = function(x) stringr::str_trunc(x, 20)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + xlab("The job security is good") + ylab("Number of Responses")
The data set underwent a thorough preprocessing to ensure the data integrity. Missing values (NA’s) were managed through techniques such as imputation or removal of incomplete entries. It was decided to drop “NA” values. “Meaning” variable was considered for dropping “NA” values, and subset() function was used. The final results are presented in Figure 2.
Code
job_security_survey_complete_with_varnames_clean <- subset(job_security_survey_complete_with_varnames, !is.na(job_security_survey_complete_with_varnames$Meaning))
ggplot(job_security_survey_complete_with_varnames_clean, aes(x = Meaning, fill = Meaning )) + scale_x_discrete(label = function(x) stringr::str_trunc(x, 20)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + xlab("The job security is good") + ylab("Number of Responses")
\(\color{darkblue}{\text{3.13. Mental Health}}\)
The data set comes from a survey capturing self-reported experiences of poor mental health within the past 30 days. Respondents provided information regarding the frequency of such occurrences. Additionally, the data set has been segmented into distinct periods aligning with significant temporal milestones or economic events (referenced earlier in this report) to analyse potential variations in reported mental health instances across these periods.
Source: https://gssdataexplorer.norc.org
Code
mental_health <- read.csv(here::here("data/days of poor mental health past 30 days.csv"))
kable(head((mental_health),20))| year | id_ | mntlhlth | ballot |
|---|---|---|---|
| 1972 | 1 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 2 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 3 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 4 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 5 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 6 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 7 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 8 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 9 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 10 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 11 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 12 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 13 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 14 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 15 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 16 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 17 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 18 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 19 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 20 | .i: Inapplicable | .i: Inapplicable |
Code
mental_health_complete <- mental_health|> complete(year,id_)The data is available from 1972 to 2022 The Mental Health dataset is not complete, and it has many missing values. In order to have a complete dataset, the complete function was used. After completion of the dataset, the size of the dataset increased from 72390 to 153340. The final distribution can be seen in Figure 3.
Code
#unique(mental_health$mntlhlth)
ggplot(mental_health_complete, aes(x = mntlhlth)) + scale_x_discrete(label = function(x) stringr::str_trunc(x, 20)) + geom_bar(fill="lightblue") + theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + xlab("Days of poor mental health past 30 days") + ylab("Number of Responses") 
The data set underwent a thorough preprocessing to ensure the data integrity. Missing values (NA’s) were managed through techniques such as imputation or removal of incomplete entries. Given the survey nature, careful handling of missing data points (NA’s) was conducted to maintain data quality. Firstly, in order to find all values other than number of days, below table was created.
All of the “Inapplicable” values and all the answers appearing as “No answer”, “Do not Know/Cannot Choose” and “-97” are replaced with “NA”.
Code
mental_health[mental_health$mntlhlth==".i: Inapplicable"|mental_health$mntlhlth==".n: No answer" |mental_health$mntlhlth==".d: Do not Know/Cannot Choose"|mental_health$mntlhlth=="-97",3] <- NA
mental_health_complete <- mental_health|> complete(year,id_)Then, by using the same strategy, all of the “NA” values are dropped, and the results are presented in Figure 4.
Code
mental_health_complete_clean <- subset(mental_health_complete, !is.na(mental_health_complete$mntlhlth))
mental_health_complete_clean$mntlhlth <- as.numeric(mental_health_complete_clean$mntlhlth)
ggplot(mental_health_complete_clean, aes(x = mntlhlth)) + geom_bar(fill="lightblue") + theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + xlab("Days of poor mental health past 30 days") + ylab("Number of Responses")
\(\color{darkblue}{\text{3.14. Income Alone}}\) \(\color{darkblue}{\text{is enough}}\)
The table showcases the survey responses exploring whether income alone is recognized as sufficient for individuals’ needs. It presents the count or frequency of responses regarding the adequacy of income to cover the necessities.
Source: https://gssdataexplorer.norc.org:
Code
income_alone <- read.csv(here::here("data/income alone is enough.csv"))
kable(head((income_alone),20))| year | id_ | answers | ballot |
|---|---|---|---|
| 1972 | 1 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 2 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 3 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 4 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 5 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 6 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 7 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 8 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 9 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 10 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 11 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 12 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 13 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 14 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 15 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 16 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 17 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 18 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 19 | .i: Inapplicable | .i: Inapplicable |
| 1972 | 20 | .i: Inapplicable | .i: Inapplicable |
From a survey data set, the information underwent preprocessing to ensure accuracy and overall completeness. The data is available from 1972 to 2022. Missing or inapplicable values were addressed, and the data set was filtered to contain the complete records by year and individual ID. Result can be seen in below table.
Code
income_alone_graph_complete <- income_alone|> complete(year,id_)
ggplot(income_alone_graph_complete, aes(x = answers, fill = answers)) + scale_x_discrete(label = function(x) stringr::str_trunc(x, 20)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, vjust = 2, hjust=1))+ xlab("Is income alone enough?") + ylab("Number of Responses")
In order to find all the values other than the responses, the table below was created.
Code
reactable(data.frame("unique name" =unique(income_alone$answers)),
sortable = TRUE,
searchable = TRUE)
The dataset was carefully cleansed to manage missing values and ensure the data quality. This cleaning process aimed to provide a reliable description of the responses regarding the income adequacy. All the “Inapplicable” values and all the answers as “No answer”, “Do not Know/Cannot Choose” and “Skipped on Web” are replaced with “NA”.
Code
income_alone[income_alone$answers==".i: Inapplicable"|income_alone$answers==".n: No answer" |income_alone$answers==".d: Do not Know/Cannot Choose"|income_alone$answers==".s: Skipped on Web",3] <- NA
income_alone_complete <- income_alone|> complete(year,id_)The table’s visual representation is displayed as a bar chart generated using R’s ‘ggplot’ function. The x-axis represents the categories related to income adequacy responses, while the y-axis is about the count of responses.
Code
income_alone_complete <- income_alone|> complete(year,id_)
income_alone_complete_clean <- subset(income_alone_complete, !is.na(income_alone_complete$answers))
ggplot(income_alone_complete_clean, aes(x = answers, fill = answers)) + scale_x_discrete(label = function(x) stringr::str_trunc(x, 20)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, vjust = 2, hjust=1))+ xlab("Is income alone enough?") + ylab("Number of Responses")
Through the cleaning process, the table offers insights into individuals’ perceptions regarding the sufficiency of income alone to meet their needs, which provides an overview of responses and any potential trends when we consider income adequacy.
\(\color{darkblue}{\text{3.15. Satisfaction with}}\) \(\color{darkblue}{\text{Financial Situation}}\)
This table displays the survey responses assessing the individuals’ satisfaction levels regarding their financial situations. It presents the count or frequency of responses that indicates the varying degrees of satisfaction with financial circumstances.
Source: https://gssdataexplorer.norc.org
Code
satis_financial <- read.csv(here::here("data/satisfaction with financial situation.csv"))
kable(head((satis_financial),20))| year | id_ | satfin | ballot |
|---|---|---|---|
| 1972 | 1 | Not satisfied at all | .i: Inapplicable |
| 1972 | 2 | More or less satisfied | .i: Inapplicable |
| 1972 | 3 | Pretty well satisfied | .i: Inapplicable |
| 1972 | 4 | Not satisfied at all | .i: Inapplicable |
| 1972 | 5 | Pretty well satisfied | .i: Inapplicable |
| 1972 | 6 | More or less satisfied | .i: Inapplicable |
| 1972 | 7 | More or less satisfied | .i: Inapplicable |
| 1972 | 8 | Not satisfied at all | .i: Inapplicable |
| 1972 | 9 | More or less satisfied | .i: Inapplicable |
| 1972 | 10 | Not satisfied at all | .i: Inapplicable |
| 1972 | 11 | Not satisfied at all | .i: Inapplicable |
| 1972 | 12 | Pretty well satisfied | .i: Inapplicable |
| 1972 | 13 | Pretty well satisfied | .i: Inapplicable |
| 1972 | 14 | More or less satisfied | .i: Inapplicable |
| 1972 | 15 | Pretty well satisfied | .i: Inapplicable |
| 1972 | 16 | More or less satisfied | .i: Inapplicable |
| 1972 | 17 | Pretty well satisfied | .i: Inapplicable |
| 1972 | 18 | Pretty well satisfied | .i: Inapplicable |
| 1972 | 19 | More or less satisfied | .i: Inapplicable |
| 1972 | 20 | Pretty well satisfied | .i: Inapplicable |
Coming from a survey dataset, the information underwent preprocessing to ensure accuracy. The data is available from 1972 to 2022. The dataset was filtered to include complete records by year as well as individual ID, to ensure the integrity of the data.
Code
satis_financial_complete_for_graph <- satis_financial|> complete(year,id_)
ggplot(satis_financial_complete_for_graph, aes(x = satfin, fill = satfin))+ scale_x_discrete(label = function(x) stringr::str_trunc(x, 20)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + xlab("Satisfaction with financial situation") + ylab("Number of Responses")
Firstly, in order to find all of the values other than the number of days, the table presented below was created.
Code
reactable(data.frame("unique name" = unique(satis_financial$satfin)),
sortable = TRUE,
searchable = TRUE)
The dataset went through a cleaning process to manage the missing values. Techniques such as filtering for complete entries by specific criteria were applied to create a refined dataset that was more representative of the respondents’ satisfaction levels with their financial situations. All of the “Inapplicable” values and all the answers displayed as “No answer”, “Do not Know/Cannot Choose” and “Skipped on Web” are replaced with “NA”.
Code
satis_financial[satis_financial$satfin==".i: Inapplicable"|satis_financial$satfin==".n: No answer" |satis_financial$satfin==".d: Do not Know/Cannot Choose"|satis_financial$satfin==".s: Skipped on Web",3] <- NAThe table’s visual representation is a bar chart created using R’s ‘ggplot’ function. The x-axis is about the categories indicating different satisfaction levels with financial situations and the y-axis represents the count of these responses.
Code
satis_financial_complete <- satis_financial|> complete(year,id_)
satis_financial_complete_clean <- subset(satis_financial_complete, !is.na(satis_financial_complete$satfin))
ggplot(satis_financial_complete_clean, aes(x = satfin, fill = satfin))+ scale_x_discrete(label = function(x) stringr::str_trunc(x, 20)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + xlab("Satisfaction with financial situation") + ylab("Number of Responses")
This table provides insights into individuals’ diverse satisfaction levels regarding their financial circumstances, it offers an overview of the responses and the potential trends in financial satisfaction perceptions among the respondents.
Something to also consider along the way would be the numerous missing values we have encountered in satisfaction level response and as a matter of fact in the other dataset surveys as well. It is important to consider the potential biases or limitations introduced due to missing data, and understand how it might affect the representativeness of conclusions we draw regarding satisfaction levels with financial situations.
\(\color{darkblue}{\text{4. Exploratory Data}}\) \(\color{darkblue}{\text{Analysis}}\)
\(\color{darkblue}{\text{4.1. Relationship between}}\) \(\color{darkblue}{\text{Fed Interest Rate,}}\) \(\color{darkblue}{\text{Unemployment}}\) \(\color{darkblue}{\text{and CPI}}\)
This graph shows different variations between Fed Interest Rate and Unemployment and CPI during the periods mentioned above. U.S. economy during this time frame lived various economic cycles, policy responses and external shocks. We note a certain convergence of these graphs during major events such as the severe recession of 1970s and 1980s, the SUPRIMES and COVID19 crises.
Code
interest_rate2 <- Interest_rate[-nrow(Interest_rate),]
Unemployment2 <- Unemployment_rate[265:908,] #it should start from 01/01/1970
Cpi_rate2 <- Cpi_rate[,c(1,2)]
#Home_ownership2 <- Home_ownership
interest_rate2$group <- c("Fed Interest Rate")
Unemployment2$group <- c("Unemployment")
Cpi_rate2$group <- c("Cpi_rate")
#Home_ownership2$group <- c("Home Ownership")
colnames(interest_rate2) <- c("Date","Value","Group")
colnames(Unemployment2) <- c("Date","Value","Group")
colnames(Cpi_rate2) <- c("Date","Value","Group")
#colnames(Home_ownership2) <- c("Date","Value","Group")
Cpi_rate2$Value <- Cpi_rate2$Value*100
# in order to scale home ownership use secondary y axis !!!!!!
int_cpi_unemploy_graph <- rbind(interest_rate2,Unemployment2,Cpi_rate2)
#int_cpi_unemploy_graph <- rbind(interest_rate2,Unemployment2,Home_ownership2)
hchart( int_cpi_unemploy_graph, "line", hcaes(x = Date, y = Value, group = Group) )\(\color{darkblue}{\text{4.2. Relationship between}}\) \(\color{darkblue}{\text{Fed Interest Rate and}}\) \(\color{darkblue}{\text{Home Ownership Rate}}\)
These two graphs show the relationship between Fed interest rates and home ownership rates. We can see that over the time frame, there were several trends and events that affected both interest rates and home ownership rates. Interest Rates has known Declining trends until the early 2000s influenced by lower inflation and changes in monetary policy. On the other hand, Home Ownership Rate followed increasing trends during the 1980s and 1990s reaching its peak in mid-2000s. Housing Bubble created by subprime mortgages led to subsequent declining trends in Home Ownership Rate
Code
graph_interest_home <- inner_join(Interest_rate, Home_ownership, join_by(Date))
dat.m <- melt(graph_interest_home, measure.vars=c("Effective_Rate", "Home_Ownership_Rate"))
dat.m$Date <- as.Date(dat.m$Date, format="%d/%m/%Y")
ggplot(dat.m, aes(x=Date, y=value,group = variable))+ geom_line(aes(color = variable)) + facet_grid(variable~., scales="free_y")+ theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1, size=5)) + scale_x_date(date_breaks = "2 year", date_labels = "%Y")
\(\color{darkblue}{\text{4.3. Unemployment Rate}}\) \(\color{darkblue}{\text{by state}}\)
The rate is represented on the dynamic map from 0 to 14%. By considering the Unemployment graph (previously presented), three main periods were chosen and are the following: 01.01.1976, 01.01.2010 and 01.01.2023.
Source : https://fred.stlouisfed.org/series/CAUR#
Code
Unemploy2 <- Unemploy
c_names2 <- Unemploy2$Series.ID
Unemploy3 <- Unemploy2[-c(2, 12), ]
usa <- st_as_sf(maps::map("state", fill=TRUE, plot =FALSE))%>%
sf::st_transform('+proj=longlat +datum=WGS84')
usa$unemployment01 <- Unemploy3$X01.01.1976
usa$unemployment02 <- Unemploy3$X01.01.2010
usa$unemployment03 <- Unemploy3$X01.01.2023
pal <- colorBin("OrRd", 0:14)
leaflet(usa) %>% addProviderTiles("CartoDB.Voyager") %>%
addPolygons(fillColor =~ pal(usa$unemployment01), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa$ID,"<br>","Unemployment Rate: ",usa$unemployment01,"<br>"),group = "01.01.1976") %>%
addPolygons(fillColor =~ pal(usa$unemployment02), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa$ID,"<br>","Unemployment Rate: ",usa$unemployment02,"<br>"),group = "01.01.2010") %>%
addPolygons(fillColor =~ pal(usa$unemployment03), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa$ID,"<br>","Unemployment Rate: ",usa$unemployment03,"<br>"),group = "01.01.2023") %>%
addLayersControl(baseGroups=c("01.01.1976","01.01.2010", "01.01.2023"),position = "bottomleft",options = layersControlOptions(collapsed = FALSE)) %>%
addLegend("bottomright",pal = pal, values=usa$unemployment01 ,title="Unemployment Rate",opacity=1)
As it can be seen in the interactive graph above, the level of unemployment rate change in each state is different. For example, the unemployment rate in Nebraska in 2010 is 4.9, and decreased to 2.5 in 2023. The total change is -48.98%. On the other hand, if California state is considered for this calculation,12.6 ,4.2 and -200% can be observed for the year 2010, 2023 and total change respectively. Therefore, it can be claimed that while performing linear regression, this variable may be controlled by some states that have high fluctuations in unemployment rate. In addition, if a remarkable coefficient is observed for this variable, it can be stated that the estimation results will be more aligned with the real data of these states.
\(\color{darkblue}{\text{4.4. Personal Consumption}}\) \(\color{darkblue}{\text{Expenditures by state}}\)
Dynamic map representing the metric that quantifies the total spending by individuals or households on various goods and services within specific states in the United States. It offers a detailed view of consumer behavior and economic activity at the state level.
Source(https://fred.stlouisfed.org/graph/?m=QzLB#)
Code
Personal_consumption_by_state2 <- Personal_consumption_by_state
Personal_consumption_by_state3 <- Personal_consumption_by_state2[-c(2, 12), ]
usa_consumption <- st_as_sf(maps::map("state", fill=TRUE, plot =FALSE))%>%
sf::st_transform('+proj=longlat +datum=WGS84')
usa_consumption$consumption01 <- Personal_consumption_by_state3$X01.01.1997
usa_consumption$consumption02 <- Personal_consumption_by_state3$X01.01.2010
usa_consumption$consumption03 <- Personal_consumption_by_state3$X01.01.2021
pal <- colorBin("OrRd", seq(0,2100000))
leaflet(usa_consumption) %>% addProviderTiles("CartoDB.Voyager") %>%
addPolygons(fillColor =~ pal(usa_consumption$consumption01), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa_consumption$ID,"<br>","Personal Consumption Expenditures:$ ",usa_consumption$consumption01,"<br>", "Millions"),group = "01.01.1976") %>%
addPolygons(fillColor =~ pal(usa_consumption$consumption02), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa_consumption$ID,"<br>","Personal Consumption Expenditures:$ ",usa_consumption$consumption02,"<br>", "Millions"),group = "01.01.2010") %>%
addPolygons(fillColor =~ pal(usa_consumption$consumption03), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa_consumption$ID,"<br>","Personal Consumption Expenditures: $ ",usa_consumption$consumption03,"<br>", "Millions"),group = "01.01.2021") %>%
addLayersControl(baseGroups=c("01.01.1976","01.01.2010", "01.01.2021"),position = "bottomleft",options = layersControlOptions(collapsed = FALSE)) %>%
addLegend("bottomright",pal = pal, values=usa_consumption$consumption01 ,title="Personal Consumption Expenditures (Millions of Dollars)",opacity=1)
This visualization enables us to better understand the effect of states on the personal consumption. Indeed, we observe a great disparity between the states. The personal consumption in the state of California is decoupling greatly. If previous example considered, Personal Consumption Expenditures in Nebraska in 2010 is 5.921^{4}, and in 2021 is 9.07^{4}. The total change is 53.192%. On the other hand, if California state is considered for this calculation,1.29^{6} ,2.083^{6} and 38.048% can be observed for year 2010, 2021 and total change respectively. Although we do not have the data until 2021, it can be sufficient to make some interpretations. The percentage of change in Nebraska is higher than the one in California. However, for this calculation, we also need to consider the total share that each state has in total Personal Consumption in USA. Therefore, this plot makes us aware that some states have a greater weight than others on this variable, which we need to consider in our linear regression. In this case, we can assume that all results may be driven by the state of California.
\(\color{darkblue}{\text{4.5. House Price Index}}\) \(\color{darkblue}{\text{by State}}\)
Dynamic map representing the key economic indicator that tracks changes in residential property prices within individual states across the United States. It provides a detailed perspective on real estate market performance, offering insights into regional housing market dynamics.
Source : https://fred.stlouisfed.org/series/CASTHPI#
Code
House_price_index_state2 <- House_price_index_state
House_price_index_state3 <- House_price_index_state2[-c(2, 12), ]
usa_house_price <- st_as_sf(maps::map("state", fill=TRUE, plot =FALSE))%>%
sf::st_transform('+proj=longlat +datum=WGS84')
usa_house_price$houseprice1 <- House_price_index_state3$X01.01.1997
usa_house_price$houseprice2 <- House_price_index_state3$X01.01.2010
usa_house_price$houseprice3 <- House_price_index_state3$X01.01.2023
pal <- colorBin("OrRd", 40:1200)
leaflet(usa_house_price) %>% addProviderTiles("CartoDB.Voyager") %>%
addPolygons(fillColor =~ pal(usa_house_price$houseprice1), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa_house_price$ID,"<br>","House Price Index: ",usa_house_price$houseprice1,"<br>"),group = "01.01.1976") %>%
addPolygons(fillColor =~ pal(usa_house_price$houseprice2), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa_house_price$ID,"<br>","House Price Index: ",usa_house_price$houseprice2,"<br>"),group = "01.01.2010") %>%
addPolygons(fillColor =~ pal(usa_house_price$houseprice3), fillOpacity = 0.5 , weight = 1,popup= paste("State: ",usa_house_price$ID,"<br>","House Price Index: ",usa_house_price$houseprice3,"<br>"),group = "01.01.2023") %>%
addLayersControl(baseGroups=c("01.01.1976","01.01.2010", "01.01.2023"),position = "bottomleft",options = layersControlOptions(collapsed = FALSE)) %>%
addLegend("bottomright",pal = pal, values=usa_house_price$consumption01 ,title="House Price Index",opacity=1)
This visualization enables us to better understand the effect of states on House Price Index. If the previous example is considered, Personal Consumption Expenditures in Nebraska in 2010 is 247.11, and 475.75in 2023. The total change is 92.526%. On the other hand, if California state is considered for this calculation,409.61 ,719.24 and 54.115% can be observed for year 2010, 2023 and total change respectively. Therefore, it can be claimed that although developed states are main drivers for total index, rural states are dominating the percentage change of House Price Index.
\(\color{darkblue}{\text{4.6. Job security is}}\) \(\color{darkblue}{\text{good survey}}\)
For a better understanding of the data, the below figure was created. Firstly, the dataset is grouped by years and the percentages are calculated for each year individually by using group_by() and summarize() functions. Then, for plotting purposes, the dataset is prolonged by using pivot_longer() function. By doing this, we could generate columns that can be used in factor() function for the bar charts.
Code
yearly_result_jobsec <- job_security_survey_complete_with_varnames_clean %>%
group_by(year) %>%
summarize(
Very_true = sum(Meaning == "Very true", na.rm = TRUE) / n() * 100,
Somewhat_true = sum(Meaning == "Somewhat true", na.rm = TRUE) / n() * 100,
Not_too_true = sum(Meaning == "Not too true", na.rm = TRUE) / n() * 100,
Not_at_all_true = sum(Meaning == "Not at all true", na.rm = TRUE) / n() * 100
)
yearly_result_jobsec <- yearly_result_jobsec |>
pivot_longer(c(`Very_true`, `Somewhat_true`, `Not_too_true`, `Not_at_all_true`),
names_to = "Answers",
values_to = "Percentages")
ggplot(yearly_result_jobsec, aes(fill = Answers,
y = Percentages, x = as.character(year)))+
geom_bar(position = "stack", stat = "identity")+
theme(plot.title = element_text(hjust = 0.5)) + xlab("Year") +theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + ggtitle("Job security is good survey")
According to the graph, the peak value of answer “Not at all true” can be observed in year 2006. Other than this year, it can be stated that the fluctuation in responses is very small. Therefore, if this variable will be effective for the linear model, a large coefficient for this variable can be expected. The reason is that it should enlarge the effect of any change in this variable.
\(\color{darkblue}{\text{4.7. Income Alone is}}\) \(\color{darkblue}{\text{enough survey}}\)
The yearly distribution of answer percentages is presented in the figure below. Firstly, the dataset is grouped by years and the percentages are calculated for each year individually by using group_by() and summarize() functions. Then, for plotting purposes, the dataset is prolonged by using pivot_longer() function. By doing this, we could generate columns that can be used in factor() function for the bar charts.
Code
yearly_result_income_alone <- income_alone_complete_clean %>%
group_by(year) %>%
summarize(
Yes = sum(answers == "YES", na.rm = TRUE) / n() * 100,
No = sum(answers == "NO", na.rm = TRUE) / n() * 100
)
yearly_result_income_alone <- yearly_result_income_alone |>
pivot_longer(c(`Yes`, `No`),
names_to = "Answers",
values_to = "Percentages")
ggplot(yearly_result_income_alone, aes(fill = Answers,
y = Percentages, x = as.character(year)))+
geom_bar(position = "stack", stat = "identity")+
theme(plot.title = element_text(hjust = 0.5)) + xlab("Year") + ggtitle("Income alone is enough survey")
According to the bar charts, it can be stated that the fluctuation in responses is very small. Therefore, if this variable will be effective for the linear model, a large coefficient for this variable can be expected. The reason is that it should enlarge the effect of any change in this variable.
\(\color{darkblue}{\text{4.8. Satisfaction with}}\) \(\color{darkblue}{\text{Financial Situation survey}}\)
Firstly, the dataset is grouped by years and the percentages are calculated for each year individually by using group_by() and summarize() functions. Then, for plotting purposes, the dataset is prolonged by using pivot_longer() function. By doing this, we could generate columns that can be used in factor() function for the bar charts.
Code
yearly_result_satfin <- satis_financial_complete_clean %>%
group_by(year) %>%
summarize(
Not_satisfied_at_all = sum(satfin == "Not satisfied at all", na.rm = TRUE) / n() * 100,
More_or_less_satisfied = sum(satfin == "More or less satisfied", na.rm = TRUE) / n() * 100,
Pretty_well_satisfied = sum(satfin == "Pretty well satisfied", na.rm = TRUE) / n() * 100
)
yearly_result_satfin <- yearly_result_satfin |>
pivot_longer(c(`Not_satisfied_at_all`, `More_or_less_satisfied`, `Pretty_well_satisfied`),
names_to = "Answers",
values_to = "Percentages")
ggplot(yearly_result_satfin, aes(fill = Answers,
y = Percentages, x = as.character(year)))+
geom_bar(position = "stack", stat = "identity")+
theme(plot.title = element_text(hjust = 0.5)) + xlab("Year") +theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) + ggtitle("Satisfaction with financial situation survey")
According to this graph, more fluctuations in responses can be observed. Thus, it can be claimed that people are more sensitive to this question rather than other questions that have been previously presented. Therefore, it can be more aligned with the economic conditions than other survey results if all the surveys are being compared.
\(\color{darkblue}{\text{5. Modeling}}\)
The purpose of this report is to examine the relationship between the interest rates and the consumer spending and saving habits in the USA. Many fluctuating periods can be observed in the USA economy. There are periods in which the interest rate was increasing consecutively, and some periods in which the interest rate was decreasing consecutively. Due to the fact that this report is focusing on the effect of the increasing interest rate in relations to other variables, the available data should be filtered. In addition, the report is focusing on the change in people’s behaviors as well. In order to highlight this change, five periods in which a consecutive increase in the interest rate was observed were chosen. These periods can be seen in the below figure.
Code
Interest_rate_1972_1974 <-Interest_rate[which(Interest_rate$Date=="1972-06-01"):which(Interest_rate$Date=="1974-09-01"),]
Interest_rate_1977_1981 <-Interest_rate[which(Interest_rate$Date=="1977-04-01"):which(Interest_rate$Date=="1981-01-01"),]
Interest_rate_1993_1995 <-Interest_rate[which(Interest_rate$Date=="1993-10-01"):which(Interest_rate$Date=="1995-05-01"),]
Interest_rate_2003_2007 <-Interest_rate[which(Interest_rate$Date=="2003-09-01"):which(Interest_rate$Date=="2007-01-01"),]
Interest_rate_2021_2023 <-Interest_rate[which(Interest_rate$Date=="2021-10-01"):which(Interest_rate$Date=="2023-08-01"),]
Interest_rate_1972_1974$group <- c("1972-1974")
Interest_rate_1977_1981$group <- c("1977-1981")
Interest_rate_1993_1995$group <- c("1993-1995")
Interest_rate_2003_2007$group <- c("2003-2007")
Interest_rate_2021_2023$group <- c("2021-2023")
interest_rate_all_periods <- rbind(Interest_rate_1972_1974,Interest_rate_1977_1981,Interest_rate_1993_1995,Interest_rate_2003_2007,Interest_rate_2021_2023)
diff <- setdiff(Interest_rate$Date,interest_rate_all_periods$Date) # find the rows that is not present in selected periods
hchart(
interest_rate_all_periods, "line",
hcaes(x = Date, y = Effective_Rate, group = group)
)
After determining these periods, all the datasets were divided into five different periods.
\(\color{darkblue}{\text{5.1. Job security}}\)
As mentioned in the Data section, Job Security survey only includes yearly values. However, in order to use this data for the linear model, having the monthly values of this survey is required. The preparation of the monthly values will be discussed below. Firstly, the job security survey is divided into five determined periods.
Code
job_security_survey_complete_1972_1974_clean <- subset(job_security_survey_complete_with_varnames_clean,job_security_survey_complete_with_varnames_clean$year>=1972
& job_security_survey_complete_with_varnames_clean$year<=1974)
job_security_survey_complete_1977_1981_clean <- subset(job_security_survey_complete_with_varnames_clean,job_security_survey_complete_with_varnames_clean$year>=1977
& job_security_survey_complete_with_varnames_clean$year<=1981)
job_security_survey_complete_1993_1995_clean <- subset(job_security_survey_complete_with_varnames_clean,job_security_survey_complete_with_varnames_clean$year>=1993
& job_security_survey_complete_with_varnames_clean$year<=1995)
job_security_survey_complete_2003_2007_clean <- subset(job_security_survey_complete_with_varnames_clean,job_security_survey_complete_with_varnames_clean$year>=2003
& job_security_survey_complete_with_varnames_clean$year<=2007)
#we have data until 2022
job_security_survey_complete_2021_2023_clean <- subset(job_security_survey_complete_with_varnames_clean,job_security_survey_complete_with_varnames_clean$year>=2021
& job_security_survey_complete_with_varnames_clean$year<=2023)After the dividing process, the total count of each vote is calculated for each period for modeling purposes. The total count for the period 1993 - 1995 can be seen in the below table.
Code
#cleaning NA numbers
#finding the counts of answers for each year
jsc_1993_1995_count <- job_security_survey_complete_1993_1995_clean |> group_by(year) |> count(Code)
jsc_2003_2007_count <- job_security_survey_complete_2003_2007_clean |> group_by(year) |> count(Code)
jsc_2021_2023_count <- job_security_survey_complete_2021_2023_clean |> group_by(year) |> count(Code)
kable(jsc_1993_1995_count)| year | Code | n |
|---|---|---|
| 1993 | 1 | 549 |
| 1993 | 2 | 531 |
| 1993 | 3 | 526 |
| 1994 | 1 | 981 |
| 1994 | 2 | 996 |
| 1994 | 3 | 1015 |
The total count for the period 2003 - 2007 can be seen in the below table.
Code
kable(jsc_2003_2007_count)| year | Code | n |
|---|---|---|
| 2004 | 1 | 952 |
| 2004 | 2 | 941 |
| 2004 | 3 | 919 |
| 2006 | 1 | 1003 |
| 2006 | 2 | 989 |
| 2006 | 3 | 1000 |
| 2006 | 4 | 1518 |
The total count for the period 2021 - 2023 can be seen in the below table.
Code
kable(jsc_2021_2023_count)| year | Code | n |
|---|---|---|
| 2021 | 1 | 1360 |
| 2021 | 2 | 1357 |
| 2021 | 3 | 1315 |
| 2022 | 1 | 1173 |
| 2022 | 2 | 1203 |
| 2022 | 3 | 1168 |
After finding the total counts, the creation of the monthly values process began. For the monthly values, it was estimated that the change of total count occurred linearly. For example, if the total count of votes represents 100 in 2003, and represents 220 in 2004, the total count is increased by 10 in each month. This estimation is valid for the decreasing values as well. All monthly survey results were created based on this approach.
\(\color{darkblue}{\text{5.1.1. For the 1993-1995}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 1993 - 1995 period can be observed in the below table.
Code
dates_1993_1995 = seq(from = as.Date("1993-10-01"), to = as.Date("1995-05-01"), by = 'month')
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
# the beginning date is 01/10/1993, and the end date is 01/05/1995, therefore we need to start from 10th month and end in 5th month
#for the survey results from 1993 to 1995
##answers as 1
increment_1993_1995_v1 <- round((jsc_1993_1995_count$n[4]-jsc_1993_1995_count$n[1])/12)
values_change_1993_1995_v1_jsc <- c(seq(jsc_1993_1995_count$n[1]+10*increment_1993_1995_v1, jsc_1993_1995_count$n[1]+29*increment_1993_1995_v1, by=increment_1993_1995_v1))
##answers as 2
increment_1993_1995_v2 <- round((jsc_1993_1995_count$n[5]-jsc_1993_1995_count$n[2])/12)
values_change_1993_1995_v2_jsc <- c(seq(jsc_1993_1995_count$n[2]+10*increment_1993_1995_v2, jsc_1993_1995_count$n[2]+29*increment_1993_1995_v2, by=increment_1993_1995_v2))
##answers as 3
increment_1993_1995_v3 <- round((jsc_1993_1995_count$n[6]-jsc_1993_1995_count$n[3])/12)
values_change_1993_1995_v3_jsc <- c(seq(jsc_1993_1995_count$n[3]+10*increment_1993_1995_v3, jsc_1993_1995_count$n[3]+29*increment_1993_1995_v3, by=increment_1993_1995_v3))
##answers as 4 - we do not have 4 for these years, it is accepted as 0
values_change_1993_1995_v4_jsc <- rep(0, length(values_change_1993_1995_v1_jsc))
#creating data frame for linear model input for 1993 to 1995
jsc_1993_1995_lminput <- data.frame(dates_1993_1995,values_change_1993_1995_v1_jsc,values_change_1993_1995_v2_jsc,values_change_1993_1995_v3_jsc,values_change_1993_1995_v4_jsc)
#finding the percentage of each answer
row_sum_jsc_93 <- rowSums(jsc_1993_1995_lminput[,2:5])
jsc_1993_1995_lminput[, 2:5] <- (jsc_1993_1995_lminput[, 2:5]/row_sum_jsc_93)*100
kable(head(jsc_1993_1995_lminput,10))| dates_1993_1995 | values_change_1993_1995_v1_jsc | values_change_1993_1995_v2_jsc | values_change_1993_1995_v3_jsc | values_change_1993_1995_v4_jsc |
|---|---|---|---|---|
| 1993-10-01 | 32.9 | 33.3 | 33.8 | 0 |
| 1993-11-01 | 32.8 | 33.3 | 33.9 | 0 |
| 1993-12-01 | 32.7 | 33.3 | 34.0 | 0 |
| 1994-01-01 | 32.7 | 33.3 | 34.0 | 0 |
| 1994-02-01 | 32.6 | 33.3 | 34.1 | 0 |
| 1994-03-01 | 32.5 | 33.4 | 34.1 | 0 |
| 1994-04-01 | 32.5 | 33.4 | 34.1 | 0 |
| 1994-05-01 | 32.4 | 33.4 | 34.2 | 0 |
| 1994-06-01 | 32.4 | 33.4 | 34.2 | 0 |
| 1994-07-01 | 32.4 | 33.4 | 34.3 | 0 |
\(\color{darkblue}{\text{5.1.2. For the 2003-200}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2003 - 2007 period can be seen in the below table.
Code
dates_2003_2007 = seq(from = as.Date("2003-09-01"), to = as.Date("2007-01-01"), by = 'month')
#for the survey results from 2003 to 2007/ same assumptions are valid for this calculation as well. #for the survey results from 2003 to 2007
# corresponding period is 01/09/2003 - 01/01/2007
##answers as 1
increment_2003_2007_v1 <- round((jsc_2003_2007_count$n[4]-jsc_2003_2007_count$n[1])/24) # we have 2004 and 2006 results, so we need to subtract 3 months to reach 01/09/2003
values_change_2003_2007_v1_jsc <- c(seq(jsc_2003_2007_count$n[1]-3*increment_2003_2007_v1, jsc_2003_2007_count$n[1]+37*increment_2003_2007_v1, by=increment_2003_2007_v1))
##answers as 2
increment_2003_2007_v2 <- round((jsc_2003_2007_count$n[5]-jsc_2003_2007_count$n[2])/24) # we have 2004 and 2006 results
values_change_2003_2007_v2_jsc <- c(seq(jsc_2003_2007_count$n[2]-3*increment_2003_2007_v2, jsc_2003_2007_count$n[2]+37*increment_2003_2007_v2, by=increment_2003_2007_v2))
##answers as 3
increment_2003_2007_v3 <- round((jsc_2003_2007_count$n[6]-jsc_2003_2007_count$n[3])/24) # we have 2004 and 2006 results
values_change_2003_2007_v3_jsc <- c(seq(jsc_2003_2007_count$n[3]-3*increment_2003_2007_v3, jsc_2003_2007_count$n[3]+37*increment_2003_2007_v3, by=increment_2003_2007_v3))
##answers as 4
increment_2003_2007_v4 <- round((jsc_2003_2007_count$n[7]-jsc_2003_2007_count$n[4])/24) # we have 2004 and 2006 results
values_change_2003_2007_v4_jsc <- c(seq(jsc_2003_2007_count$n[4]-3*increment_2003_2007_v4, jsc_2003_2007_count$n[4]+37*increment_2003_2007_v4, by=increment_2003_2007_v4))
#creating data frame for linear model input for 2003 to 2007
jsc_2003_2007_lminput <- data.frame(dates_2003_2007,values_change_2003_2007_v1_jsc,values_change_2003_2007_v2_jsc,values_change_2003_2007_v3_jsc,values_change_2003_2007_v4_jsc)
#finding the percentage of each answer
row_sum_jsc_03 <- rowSums(jsc_2003_2007_lminput[,2:5])
jsc_2003_2007_lminput[, 2:5] <- (jsc_2003_2007_lminput[, 2:5]/row_sum_jsc_03)*100
kable(head(jsc_2003_2007_lminput,10))| dates_2003_2007 | values_change_2003_2007_v1_jsc | values_change_2003_2007_v2_jsc | values_change_2003_2007_v3_jsc | values_change_2003_2007_v4_jsc |
|---|---|---|---|---|
| 2003-09-01 | 25.4 | 25.1 | 24.4 | 25.2 |
| 2003-10-01 | 25.2 | 24.9 | 24.3 | 25.6 |
| 2003-11-01 | 25.1 | 24.8 | 24.2 | 25.9 |
| 2003-12-01 | 25.0 | 24.7 | 24.1 | 26.3 |
| 2004-01-01 | 24.8 | 24.5 | 24.0 | 26.6 |
| 2004-02-01 | 24.7 | 24.4 | 23.9 | 27.0 |
| 2004-03-01 | 24.6 | 24.3 | 23.8 | 27.3 |
| 2004-04-01 | 24.4 | 24.2 | 23.7 | 27.7 |
| 2004-05-01 | 24.3 | 24.0 | 23.6 | 28.0 |
| 2004-06-01 | 24.2 | 23.9 | 23.5 | 28.3 |
\(\color{darkblue}{\text{5.1.3. For the 2021-2023}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2021 - 2023 period can be observed in the below table.
Code
dates_2021_2023 = seq(from = as.Date("2021-10-01"), to = as.Date("2023-08-01"), by = 'month')
#for the survey results from 2021 to 2023/ same assumptions are valid for this calculation as well. #for the survey results from 2021 to 2023
# corresponding period is 01/10/2021 - 01/08/2023
##answers as 1
increment_2021_2023_v1 <- round((jsc_2021_2023_count$n[4]-jsc_2021_2023_count$n[1])/24) # we have 2021 and 2022 results, so we need to add 10 months to reach 01/10/2021
values_change_2021_2023_v1_jsc <- c(seq(jsc_2021_2023_count$n[1]+10*increment_2021_2023_v1, jsc_2021_2023_count$n[1]+32*increment_2021_2023_v1, by=increment_2021_2023_v1))
##answers as 2
increment_2021_2023_v2 <- round((jsc_2021_2023_count$n[5]-jsc_2021_2023_count$n[2])/24) # we have 2021 and 2022 results, so we need to add 10 months to reach 01/10/2021
values_change_2021_2023_v2_jsc <- c(seq(jsc_2021_2023_count$n[2]+10*increment_2021_2023_v2, jsc_2021_2023_count$n[2]+32*increment_2021_2023_v2, by=increment_2021_2023_v2))
##answers as 3
increment_2021_2023_v3 <- round((jsc_2021_2023_count$n[6]-jsc_2021_2023_count$n[3])/24) # we have 2021 and 2022 results, so we need to add 10 months to reach 01/10/2021
values_change_2021_2023_v3_jsc <- c(seq(jsc_2021_2023_count$n[3]+10*increment_2021_2023_v3, jsc_2021_2023_count$n[3]+32*increment_2021_2023_v3, by=increment_2021_2023_v3))
# we do not have value 4 for this period
values_change_2021_2023_v4_jsc <- rep(0, length(values_change_2021_2023_v1_jsc))
#creating data frame for linear model input for 2003 to 2007
jsc_2021_2023_lminput <- data.frame(dates_2021_2023,values_change_2021_2023_v1_jsc,values_change_2021_2023_v2_jsc,values_change_2021_2023_v3_jsc,values_change_2021_2023_v4_jsc)
#finding the percentage of each answer
row_sum_jsc_21 <- rowSums(jsc_2021_2023_lminput[,2:5])
jsc_2021_2023_lminput[, 2:5] <- (jsc_2021_2023_lminput[, 2:5]/row_sum_jsc_21)*100
kable(head(jsc_2021_2023_lminput,10))| dates_2021_2023 | values_change_2021_2023_v1_jsc | values_change_2021_2023_v2_jsc | values_change_2021_2023_v3_jsc | values_change_2021_2023_v4_jsc |
|---|---|---|---|---|
| 2021-10-01 | 33.4 | 33.8 | 32.8 | 0 |
| 2021-11-01 | 33.4 | 33.9 | 32.8 | 0 |
| 2021-12-01 | 33.3 | 33.9 | 32.8 | 0 |
| 2022-01-01 | 33.3 | 33.9 | 32.8 | 0 |
| 2022-02-01 | 33.3 | 33.9 | 32.8 | 0 |
| 2022-03-01 | 33.2 | 34.0 | 32.8 | 0 |
| 2022-04-01 | 33.2 | 34.0 | 32.8 | 0 |
| 2022-05-01 | 33.2 | 34.0 | 32.9 | 0 |
| 2022-06-01 | 33.1 | 34.0 | 32.9 | 0 |
| 2022-07-01 | 33.1 | 34.0 | 32.9 | 0 |
\(\color{darkblue}{\text{5.2. Financial satisfaction}}\)
For the Financial Satisfaction survey, the same approach was considered, and all steps are described in the sections presented below. Firstly, the data was divided according to periods.
Code
satfin_survey_complete_1972_1974_clean <- subset(satis_financial_complete_clean,satis_financial_complete_clean$year>=1972
& satis_financial_complete_clean$year<=1974)
satfin_survey_complete_1977_1981_clean <- subset(satis_financial_complete_clean,satis_financial_complete_clean$year>=1977
& satis_financial_complete_clean$year<=1981)
satfin_survey_complete_1993_1995_clean <- subset(satis_financial_complete_clean,satis_financial_complete_clean$year>=1993
& satis_financial_complete_clean$year<=1995)
satfin_survey_complete_2003_2007_clean <- subset(satis_financial_complete_clean,satis_financial_complete_clean$year>=2003
& satis_financial_complete_clean$year<=2007)
#we have data until 2022
satfin_survey_complete_2021_2023_clean <- subset(satis_financial_complete_clean,satis_financial_complete_clean$year>=2021
& satis_financial_complete_clean$year<=2023)The total count of each answer was calculated for modeling purposes.
Code
#cleaning NA numbers
#finding the counts of answers for each year
satfin_1972_1974_count <- satfin_survey_complete_1972_1974_clean |> group_by(year) |> count(satfin)
satfin_1977_1981_count <- satfin_survey_complete_1977_1981_clean |> group_by(year) |> count(satfin)
satfin_1993_1995_count <- satfin_survey_complete_1993_1995_clean |> group_by(year) |> count(satfin)
satfin_2003_2007_count <- satfin_survey_complete_2003_2007_clean |> group_by(year) |> count(satfin)
satfin_2021_2023_count <- satfin_survey_complete_2021_2023_clean |> group_by(year) |> count(satfin)
kable(satfin_1972_1974_count)| year | satfin | n |
|---|---|---|
| 1972 | More or less satisfied | 720 |
| 1972 | Not satisfied at all | 366 |
| 1972 | Pretty well satisfied | 522 |
| 1973 | More or less satisfied | 683 |
| 1973 | Not satisfied at all | 357 |
| 1973 | Pretty well satisfied | 461 |
| 1974 | More or less satisfied | 674 |
| 1974 | Not satisfied at all | 343 |
| 1974 | Pretty well satisfied | 461 |
\(\color{darkblue}{\text{5.2.1. For the 1972-1974}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 1972 - 1974 period can be observed in the below table.
Code
dates_1972_1974 = seq(from = as.Date("1972-06-01"), to = as.Date("1974-09-01"), by = 'month')
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 1993 to 1995
##answers as 1
increment_1972_1974_v1_satfin <- round((satfin_1972_1974_count$n[7]-satfin_1972_1974_count$n[1])/24)
values_change_1972_1974_v1_satfin <- c(seq(satfin_1972_1974_count$n[1]+6*increment_1972_1974_v1_satfin, satfin_1972_1974_count$n[1]+33*increment_1972_1974_v1_satfin, by=increment_1972_1974_v1_satfin))
##answers as 2
increment_1972_1974_v2_satfin <- round((satfin_1972_1974_count$n[8]-satfin_1972_1974_count$n[2])/24)
values_change_1972_1974_v2_satfin <- c(seq(satfin_1972_1974_count$n[2]+6*increment_1972_1974_v2_satfin, satfin_1972_1974_count$n[2]+33*increment_1972_1974_v2_satfin, by=increment_1972_1974_v2_satfin))
##answers as 3
increment_1972_1974_v3_satfin <- round((satfin_1972_1974_count$n[9]-satfin_1972_1974_count$n[3])/24)
values_change_1972_1974_v3_satfin <- c(seq(satfin_1972_1974_count$n[3]+6*increment_1972_1974_v3_satfin, satfin_1972_1974_count$n[3]+33*increment_1972_1974_v3_satfin, by=increment_1972_1974_v3_satfin))
#creating data frame for linear model input for 1993 to 1995
satfin_1972_1974_lminput <- data.frame(dates_1972_1974,values_change_1972_1974_v1_satfin,values_change_1972_1974_v2_satfin,values_change_1972_1974_v3_satfin)
#finding the percentage of each answer
row_sum_satfin_72 <- rowSums(satfin_1972_1974_lminput[,2:4])
satfin_1972_1974_lminput[, 2:4] <- (satfin_1972_1974_lminput[, 2:4]/row_sum_satfin_72)*100
kable(head(satfin_1972_1974_lminput,10))| dates_1972_1974 | values_change_1972_1974_v1_satfin | values_change_1972_1974_v2_satfin | values_change_1972_1974_v3_satfin |
|---|---|---|---|
| 1972-06-01 | 45.0 | 22.9 | 32.1 |
| 1972-07-01 | 45.1 | 22.9 | 32.0 |
| 1972-08-01 | 45.1 | 22.9 | 31.9 |
| 1972-09-01 | 45.2 | 23.0 | 31.9 |
| 1972-10-01 | 45.2 | 23.0 | 31.8 |
| 1972-11-01 | 45.3 | 23.0 | 31.7 |
| 1972-12-01 | 45.3 | 23.0 | 31.6 |
| 1973-01-01 | 45.4 | 23.1 | 31.6 |
| 1973-02-01 | 45.4 | 23.1 | 31.5 |
| 1973-03-01 | 45.5 | 23.1 | 31.4 |
\(\color{darkblue}{\text{5.2.2. For the 1977-1981}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 1977 - 1981 period can be seen in the below table.
Code
dates_1977_1981 = seq(from = as.Date("1977-04-01"), to = as.Date("1981-01-01"), by = 'month')
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 1977 to 1981
##answers as 1
increment_1977_1981_v1_satfin <- round((satfin_1977_1981_count$n[7]-satfin_1977_1981_count$n[1])/36)
values_change_1977_1981_v1_satfin <- c(seq(satfin_1977_1981_count$n[1]+4*increment_1977_1981_v1_satfin, satfin_1977_1981_count$n[1]+49*increment_1977_1981_v1_satfin, by=increment_1977_1981_v1_satfin))
##answers as 2
increment_1977_1981_v2_satfin <- round((satfin_1977_1981_count$n[8]-satfin_1977_1981_count$n[2])/36)
values_change_1977_1981_v2_satfin <- c(seq(satfin_1977_1981_count$n[2]+4*increment_1977_1981_v2_satfin, satfin_1977_1981_count$n[2]+49*increment_1977_1981_v2_satfin, by=increment_1977_1981_v2_satfin))
##answers as 3
increment_1977_1981_v3_satfin <- round((satfin_1977_1981_count$n[9]-satfin_1977_1981_count$n[3])/36)
values_change_1977_1981_v3_satfin <- c(seq(satfin_1977_1981_count$n[3]+4*increment_1977_1981_v3_satfin, satfin_1977_1981_count$n[3]+49*increment_1977_1981_v3_satfin, by=increment_1977_1981_v3_satfin))
#creating data frame for linear model input for 1993 to 1995
satfin_1977_1981_lminput <- data.frame(dates_1977_1981,values_change_1977_1981_v1_satfin,values_change_1977_1981_v2_satfin,values_change_1977_1981_v3_satfin)
#finding the percentage of each answer
row_sum_satfin_77 <- rowSums(satfin_1977_1981_lminput[,2:4])
satfin_1977_1981_lminput[, 2:4] <- (satfin_1977_1981_lminput[, 2:4]/row_sum_satfin_77)*100
kable(head(satfin_1977_1981_lminput,10))| dates_1977_1981 | values_change_1977_1981_v1_satfin | values_change_1977_1981_v2_satfin | values_change_1977_1981_v3_satfin |
|---|---|---|---|
| 1977-04-01 | 43.8 | 22.7 | 33.5 |
| 1977-05-01 | 43.8 | 22.9 | 33.3 |
| 1977-06-01 | 43.8 | 23.0 | 33.1 |
| 1977-07-01 | 43.9 | 23.2 | 33.0 |
| 1977-08-01 | 43.9 | 23.3 | 32.8 |
| 1977-09-01 | 43.9 | 23.5 | 32.6 |
| 1977-10-01 | 43.9 | 23.6 | 32.4 |
| 1977-11-01 | 44.0 | 23.8 | 32.3 |
| 1977-12-01 | 44.0 | 23.9 | 32.1 |
| 1978-01-01 | 44.0 | 24.1 | 31.9 |
\(\color{darkblue}{\text{5.2.3. For the 1993-1995}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 1993 - 1995 period can be observed in the below table.
Code
dates_1993_1995 = seq(from = as.Date("1993-10-01"), to = as.Date("1995-05-01"), by = 'month')
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 1993 to 1995
##answers as 1
increment_1993_1995_v1_satfin <- round((satfin_1993_1995_count$n[4]-satfin_1993_1995_count$n[1])/12)
values_change_1993_1995_v1_satfin <- c(seq(satfin_1993_1995_count$n[1]+10*increment_1993_1995_v1_satfin, satfin_1993_1995_count$n[1]+29*increment_1993_1995_v1_satfin, by=increment_1993_1995_v1_satfin))
##answers as 2
increment_1993_1995_v2_satfin <- round((satfin_1993_1995_count$n[5]-satfin_1993_1995_count$n[2])/12)
values_change_1993_1995_v2_satfin <- c(seq(satfin_1993_1995_count$n[2]+10*increment_1993_1995_v2_satfin, satfin_1993_1995_count$n[2]+29*increment_1993_1995_v2_satfin, by=increment_1993_1995_v2_satfin))
##answers as 3
increment_1993_1995_v3_satfin <- round((satfin_1993_1995_count$n[6]-satfin_1993_1995_count$n[3])/12)
values_change_1993_1995_v3_satfin <- c(seq(satfin_1993_1995_count$n[3]+10*increment_1993_1995_v3_satfin, satfin_1993_1995_count$n[3]+29*increment_1993_1995_v3_satfin, by=increment_1993_1995_v3_satfin))
#creating data frame for linear model input for 1993 to 1995
satfin_1993_1995_lminput <- data.frame(dates_1993_1995,values_change_1993_1995_v1_satfin,values_change_1993_1995_v2_satfin,values_change_1993_1995_v3_satfin)
#finding the percentage of each answer
row_sum_satfin_93 <- rowSums(satfin_1993_1995_lminput[,2:4])
satfin_1993_1995_lminput[, 2:4] <- (satfin_1993_1995_lminput[, 2:4]/row_sum_satfin_93)*100
kable(head(satfin_1993_1995_lminput,10))| dates_1993_1995 | values_change_1993_1995_v1_satfin | values_change_1993_1995_v2_satfin | values_change_1993_1995_v3_satfin |
|---|---|---|---|
| 1993-10-01 | 45.6 | 26.4 | 28.0 |
| 1993-11-01 | 45.6 | 26.2 | 28.1 |
| 1993-12-01 | 45.7 | 26.1 | 28.2 |
| 1994-01-01 | 45.7 | 26.0 | 28.2 |
| 1994-02-01 | 45.8 | 26.0 | 28.3 |
| 1994-03-01 | 45.8 | 25.9 | 28.3 |
| 1994-04-01 | 45.9 | 25.8 | 28.3 |
| 1994-05-01 | 45.9 | 25.7 | 28.4 |
| 1994-06-01 | 45.9 | 25.6 | 28.4 |
| 1994-07-01 | 46.0 | 25.6 | 28.5 |
\(\color{darkblue}{\text{5.2.4. For the 2003-2007}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2003 - 2007 period can be observed in the below table.
Code
dates_2003_2007 = seq(from = as.Date("2003-09-01"), to = as.Date("2007-01-01"), by = 'month')
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 2003 to 2007
##answers as 1
increment_2003_2007_v1_satfin <- round((satfin_2003_2007_count$n[4]-satfin_2003_2007_count$n[1])/24)
values_change_2003_2007_v1_satfin <- c(seq(satfin_2003_2007_count$n[1]+9*increment_2003_2007_v1_satfin, satfin_2003_2007_count$n[1]+49*increment_2003_2007_v1_satfin, by=increment_2003_2007_v1_satfin))
##answers as 2
increment_2003_2007_v2_satfin <- round((satfin_2003_2007_count$n[5]-satfin_2003_2007_count$n[2])/12)
values_change_2003_2007_v2_satfin <- c(seq(satfin_2003_2007_count$n[2]+9*increment_2003_2007_v2_satfin, satfin_2003_2007_count$n[2]+49*increment_2003_2007_v2_satfin, by=increment_2003_2007_v2_satfin))
##answers as 3
increment_2003_2007_v3_satfin <- round((satfin_2003_2007_count$n[6]-satfin_2003_2007_count$n[3])/12)
values_change_2003_2007_v3_satfin <- c(seq(satfin_2003_2007_count$n[3]+9*increment_2003_2007_v3_satfin, satfin_2003_2007_count$n[3]+49*increment_2003_2007_v3_satfin, by=increment_2003_2007_v3_satfin))
#creating data frame for linear model input for 1993 to 1995
satfin_2003_2007_lminput <- data.frame(dates_2003_2007,values_change_2003_2007_v1_satfin,values_change_2003_2007_v2_satfin,values_change_2003_2007_v3_satfin)
#finding the percentage of each answer
row_sum_satfin_03 <- rowSums(satfin_2003_2007_lminput[,2:4])
satfin_2003_2007_lminput[, 2:4] <- (satfin_2003_2007_lminput[, 2:4]/row_sum_satfin_03)*100
kable(head(satfin_2003_2007_lminput,10))| dates_2003_2007 | values_change_2003_2007_v1_satfin | values_change_2003_2007_v2_satfin | values_change_2003_2007_v3_satfin |
|---|---|---|---|
| 2003-09-01 | 36.7 | 29.2 | 34.0 |
| 2003-10-01 | 36.4 | 29.4 | 34.2 |
| 2003-11-01 | 36.1 | 29.5 | 34.3 |
| 2003-12-01 | 35.8 | 29.7 | 34.5 |
| 2004-01-01 | 35.6 | 29.8 | 34.6 |
| 2004-02-01 | 35.3 | 29.9 | 34.7 |
| 2004-03-01 | 35.1 | 30.1 | 34.8 |
| 2004-04-01 | 34.9 | 30.2 | 34.9 |
| 2004-05-01 | 34.7 | 30.3 | 35.0 |
| 2004-06-01 | 34.5 | 30.3 | 35.1 |
\(\color{darkblue}{\text{5.2.5. For the 2021-2023}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2021 – 2023 period can be observed in the below table.
Code
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 2021 to 2023
##answers as 1
increment_2021_2023_v1_satfin <- round((satfin_2021_2023_count$n[4]-satfin_2021_2023_count$n[1])/24)
values_change_2021_2023_v1_satfin <- c(seq(satfin_2021_2023_count$n[1]+10*increment_2021_2023_v1_satfin, satfin_2021_2023_count$n[1]+32*increment_2021_2023_v1_satfin, by=increment_2021_2023_v1_satfin))
##answers as 2
increment_2021_2023_v2_satfin <- round((satfin_2021_2023_count$n[5]-satfin_2021_2023_count$n[2])/12)
values_change_2021_2023_v2_satfin <- c(seq(satfin_2021_2023_count$n[2]+10*increment_2021_2023_v2_satfin, satfin_2021_2023_count$n[2]+32*increment_2021_2023_v2_satfin, by=increment_2021_2023_v2_satfin))
##answers as 3
increment_2021_2023_v3_satfin <- round((satfin_2021_2023_count$n[6]-satfin_2021_2023_count$n[3])/12)
values_change_2021_2023_v3_satfin <- c(seq(satfin_2021_2023_count$n[3]+10*increment_2021_2023_v3_satfin, satfin_2021_2023_count$n[3]+32*increment_2021_2023_v3_satfin, by=increment_2021_2023_v3_satfin))
#creating data frame for linear model input for 1993 to 1995
satfin_2021_2023_lminput <- data.frame(dates_2021_2023,values_change_2021_2023_v1_satfin,values_change_2021_2023_v2_satfin,values_change_2021_2023_v3_satfin)
#finding the percentage of each answer
row_sum_satfin_21 <- rowSums(satfin_2021_2023_lminput[,2:4])
satfin_2021_2023_lminput[, 2:4] <- (satfin_2021_2023_lminput[, 2:4]/row_sum_satfin_21)*100
kable(head(satfin_2021_2023_lminput,10))| dates_2021_2023 | values_change_2021_2023_v1_satfin | values_change_2021_2023_v2_satfin | values_change_2021_2023_v3_satfin |
|---|---|---|---|
| 2021-10-01 | 45.9 | 28.7 | 25.5 |
| 2021-11-01 | 46.0 | 29.2 | 24.8 |
| 2021-12-01 | 46.1 | 29.7 | 24.2 |
| 2022-01-01 | 46.2 | 30.2 | 23.6 |
| 2022-02-01 | 46.3 | 30.8 | 22.9 |
| 2022-03-01 | 46.5 | 31.3 | 22.2 |
| 2022-04-01 | 46.6 | 31.9 | 21.5 |
| 2022-05-01 | 46.7 | 32.4 | 20.8 |
| 2022-06-01 | 46.8 | 33.0 | 20.1 |
| 2022-07-01 | 47.0 | 33.6 | 19.4 |
\(\color{darkblue}{\text{5.3. Income alone}}\)
For the Income Alone survey, the same approach was also considered, and all steps are described in the sections presented below. Firstly, the data was divided according to periods.
Code
#there are only data for 2006 and 2002, therefore 2002 is considered for base point as well.
income_alone_complete_2003_2007_clean <- subset(income_alone_complete_clean,income_alone_complete_clean$year>=2002
& income_alone_complete_clean$year<=2007)
#there are only data for 2022 and 2018, therefore 2018 is considered for base point as well.
income_alone_complete_2021_2023_clean <- subset(income_alone_complete_clean,income_alone_complete_clean$year>=2018
& income_alone_complete_clean$year<=2023)The total count of each answer was calculated for modeling purposes.
Code
#cleaning NA numbers
#finding the counts of answers for each year
income_2003_2007_count <- income_alone_complete_2003_2007_clean |> group_by(year) |> count(answers)
income_2021_2023_count <- income_alone_complete_2021_2023_clean |> group_by(year) |> count(answers)
kable(income_2003_2007_count)| year | answers | n |
|---|---|---|
| 2002 | NO | 944 |
| 2002 | YES | 827 |
| 2006 | NO | 881 |
| 2006 | YES | 827 |
\(\color{darkblue}{\text{5.3.1. For the 2003-2007}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2003 - 2007 period can be observed in the below table.
Code
dates_2003_2007 = seq(from = as.Date("2003-09-01"), to = as.Date("2007-01-01"), by = 'month')
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 2003 to 2007
##answers as 1
increment_2003_2007_v1_income <- round((income_2003_2007_count$n[3]-income_2003_2007_count$n[1])/48)
values_change_2003_2007_v1_income <- c(seq(income_2003_2007_count$n[1]+21*increment_2003_2007_v1_income, income_2003_2007_count$n[1]+61*increment_2003_2007_v1_income, by=increment_2003_2007_v1_income))
##answers as 2
#increment_2003_2007_v2_income <- round((income_2003_2007_count$n[4]-income_2003_2007_count$n[2])/48)
#values_change_2003_2007_v2_income <- c(seq(income_2003_2007_count$n[2]+21*increment_2003_2007_v2_income, income_2003_2007_count$n[2]+61*increment_2003_2007_v2_income, by=increment_2003_2007_v2_income))
values_change_2003_2007_v2_income <- rep(income_2003_2007_count$n[2],length(dates_2003_2007))
#creating data frame for linear model input for 2003 to 2007
income_2003_2007_lminput <- data.frame(dates_2003_2007,(values_change_2003_2007_v1_income/(values_change_2003_2007_v2_income+values_change_2003_2007_v1_income))*100)
names(income_2003_2007_lminput) <- c('dates_2003_2007','Income_yes_responds')
kable(head(income_2003_2007_lminput,10))| dates_2003_2007 | Income_yes_responds |
|---|---|
| 2003-09-01 | 52.7 |
| 2003-10-01 | 52.7 |
| 2003-11-01 | 52.7 |
| 2003-12-01 | 52.7 |
| 2004-01-01 | 52.6 |
| 2004-02-01 | 52.6 |
| 2004-03-01 | 52.6 |
| 2004-04-01 | 52.6 |
| 2004-05-01 | 52.5 |
| 2004-06-01 | 52.5 |
\(\color{darkblue}{\text{5.3.2. For the 2021-2023}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2021 – 2023 period can be observed in the below table.
Code
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 2021 to 2023
##answers as 1
increment_2021_2023_v1_income <- round((income_2021_2023_count$n[3]-income_2021_2023_count$n[1])/48)
values_change_2021_2023_v1_income <- c(seq(income_2021_2023_count$n[1]+46*increment_2021_2023_v1_income, income_2021_2023_count$n[1]+68*increment_2021_2023_v1_income, by=increment_2021_2023_v1_income))
##answers as 2
increment_2021_2023_v2_income <- round((income_2021_2023_count$n[4]-income_2021_2023_count$n[2])/48)
values_change_2021_2023_v2_income <- c(seq(income_2021_2023_count$n[2]+46*increment_2021_2023_v2_income, income_2021_2023_count$n[2]+68*increment_2021_2023_v2_income, by=increment_2021_2023_v2_income))
#creating data frame for linear model input for 2021 to 2023
income_2021_2023_lminput <- data.frame(dates_2021_2023,(values_change_2021_2023_v1_income/(values_change_2021_2023_v2_income+values_change_2021_2023_v1_income))*100)
names(income_2021_2023_lminput) <- c('dates_2021_2023','Income_yes_responds')
kable(head(income_2021_2023_lminput,10))| dates_2021_2023 | Income_yes_responds |
|---|---|
| 2021-10-01 | 50.7 |
| 2021-11-01 | 50.7 |
| 2021-12-01 | 50.7 |
| 2022-01-01 | 50.7 |
| 2022-02-01 | 50.8 |
| 2022-03-01 | 50.8 |
| 2022-04-01 | 50.8 |
| 2022-05-01 | 50.8 |
| 2022-06-01 | 50.9 |
| 2022-07-01 | 50.9 |
\(\color{darkblue}{\text{5.4. Mental Health}}\)
For the Mental Health survey, the same approach was also considered, and all steps are described in the sections presented below. Firstly, the data was divided according to periods.
Code
#
mental_health_complete_2003_2007_clean <- subset(mental_health_complete_clean,mental_health_complete_clean$year>=2003
& mental_health_complete_clean$year<=2007)
#there are only data for 2022 and 2018, therefore 2018 is considered for base point as well.
mental_health_complete_2021_2023_clean <- subset(mental_health_complete_clean,mental_health_complete_clean$year>=2018
& mental_health_complete_clean$year<=2023)For this survey however, the monthly average day was considered. The monthly data was created considering the same approach.
Code
#cleaning NA numbers
#finding the counts of answers for each year
mental_2003_2007_count <- mental_health_complete_2003_2007_clean |> group_by(year) |> summarize(Average_Value = mean(mntlhlth, na.rm = TRUE))
mental_2021_2023_count <- mental_health_complete_2021_2023_clean |> group_by(year) |> summarize(Average_Value = mean(mntlhlth, na.rm = TRUE))
kable(mental_2003_2007_count)| year | Average_Value |
|---|---|
| 2004 | 4.4 |
| 2006 | 3.0 |
\(\color{darkblue}{\text{5.4.1. For the 2003-2007}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2003 - 2007 period can be observed in the below table.
Code
dates_2003_2007 = seq(from = as.Date("2003-09-01"), to = as.Date("2007-01-01"), by = 'month')
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 2003 to 2007
##answers as 1
increment_2003_2007_v1_mental <- ((mental_2003_2007_count$Average_Value[2]-mental_2003_2007_count$Average_Value[1])/48)
values_change_2003_2007_v1_mental <- c(seq(mental_2003_2007_count$Average_Value[1]-3*increment_2003_2007_v1_mental, mental_2003_2007_count$Average_Value[1]+37*increment_2003_2007_v1_mental, by=increment_2003_2007_v1_mental))
#creating data frame for linear model input for 1993 to 1995
mental_2003_2007_lminput <- data.frame(dates_2003_2007,values_change_2003_2007_v1_mental)
kable(head(mental_2003_2007_lminput,10))| dates_2003_2007 | values_change_2003_2007_v1_mental |
|---|---|
| 2003-09-01 | 4.49 |
| 2003-10-01 | 4.46 |
| 2003-11-01 | 4.43 |
| 2003-12-01 | 4.40 |
| 2004-01-01 | 4.37 |
| 2004-02-01 | 4.34 |
| 2004-03-01 | 4.31 |
| 2004-04-01 | 4.28 |
| 2004-05-01 | 4.25 |
| 2004-06-01 | 4.22 |
\(\color{darkblue}{\text{5.4.2. For the 2021-2023}}\) \(\color{darkblue}{\text{period}}\)
The monthly values for the 2021 – 2023 period can be observed in the below table.
Code
#create survey result values, it is accepted that each survey hold on 01/01/.... corresponding year #due to the fact that we do not have data for 1995, we need to consider 1994
#in addition, it is assumed that each month it increases or decreases linearly
#for the survey results from 2021 to 2023
##answers as 1
increment_2021_2023_v1_mental <- ((mental_2021_2023_count$Average_Value[2]-mental_2021_2023_count$Average_Value[1])/48)
values_change_2021_2023_v1_mental <- c(seq(mental_2021_2023_count$Average_Value[1]+46*increment_2021_2023_v1_mental, mental_2021_2023_count$Average_Value[1]+68*increment_2021_2023_v1_mental, by=increment_2021_2023_v1_mental))
#creating data frame for linear model input for 1993 to 1995
mental_2021_2023_lminput <- data.frame(dates_2021_2023,values_change_2021_2023_v1_mental)
kable(head(mental_2021_2023_lminput,10))| dates_2021_2023 | values_change_2021_2023_v1_mental |
|---|---|
| 2021-10-01 | 4.51 |
| 2021-11-01 | 4.53 |
| 2021-12-01 | 4.55 |
| 2022-01-01 | 4.58 |
| 2022-02-01 | 4.59 |
| 2022-03-01 | 4.62 |
| 2022-04-01 | 4.64 |
| 2022-05-01 | 4.66 |
| 2022-06-01 | 4.68 |
| 2022-07-01 | 4.70 |
\(\color{darkblue}{\text{5.5. Creation of input data}}\) \(\color{darkblue}{\text{set for linear models}}\)
According to the research questions that are mentioned in this report, four variables will be investigated. These variables are Home Ownership, Personal Saving rate, Personal Consumption and Mental Health. These variables, apart from the Mental Health one, were divided into periods by using which() function.
Code
#divide the data set according to determined periods.
Home_ownership_1993_1995 <-Home_ownership_complete[which(Home_ownership_complete$Date=="1993-10-01"):which(Home_ownership_complete$Date=="1995-05-01"),]
Home_ownership_2003_2007 <-Home_ownership_complete[which(Home_ownership_complete$Date=="2003-09-01"):which(Home_ownership_complete$Date=="2007-01-01"),]
# for this period the maximum date is considered for comparison
Home_ownership_2021_2023 <-Home_ownership_complete[which(Home_ownership_complete$Date=="2021-10-01"):which(Home_ownership_complete$Date=="2023-04-01"),]
#Personal saving rate separation of data
Personal_saving_rate_1972_1974 <-Personal_saving_rate[which(Personal_saving_rate$Date=="1973-06-01"):which(Personal_saving_rate$Date=="1975-09-01"),]
Personal_saving_rate_1977_1981 <-Personal_saving_rate[which(Personal_saving_rate$Date=="1978-04-01"):which(Personal_saving_rate$Date=="1982-01-01"),]
Personal_saving_rate_1993_1995 <-Personal_saving_rate[which(Personal_saving_rate$Date=="1994-10-01"):which(Personal_saving_rate$Date=="1996-05-01"),]
Personal_saving_rate_2003_2007 <-Personal_saving_rate[which(Personal_saving_rate$Date=="2004-09-01"):which(Personal_saving_rate$Date=="2008-01-01"),]
# for this period the maximum date is considered for comparison
Personal_saving_rate_2021_2023 <-Personal_saving_rate[which(Personal_saving_rate$Date=="2021-10-01"):which(Personal_saving_rate$Date=="2023-08-01"),]
#Consumption expenditures separation of data
Consumption_expenditures_1972_1974 <-Consumption_expenditures[which(Consumption_expenditures$Date=="1972-06-01"):which(Consumption_expenditures$Date=="1974-09-01"),]
Consumption_expenditures_1977_1981 <-Consumption_expenditures[which(Consumption_expenditures$Date=="1977-04-01"):which(Consumption_expenditures$Date=="1981-01-01"),]
Consumption_expenditures_1993_1995 <-Consumption_expenditures[which(Consumption_expenditures$Date=="1993-10-01"):which(Consumption_expenditures$Date=="1995-05-01"),]
Consumption_expenditures_2003_2007 <-Consumption_expenditures[which(Consumption_expenditures$Date=="2003-09-01"):which(Consumption_expenditures$Date=="2007-01-01"),]
Consumption_expenditures_2021_2023 <-Consumption_expenditures[which(Consumption_expenditures$Date=="2021-10-01"):which(Consumption_expenditures$Date=="2023-08-01"),]After completion of result variables, input data sets were prepared in below sections.
\(\color{darkblue}{\text{5.5.1. Input data set for}}\) \(\color{darkblue}{\text{the 1972 – 1974 period}}\)
The input variables are: Fed Interest rate + CPI + Unemployment + Mortgage + Financial Satisfaction
Code
Input_linear_model_72 <- NULL
Input_linear_model_72 <- Interest_rate[Interest_rate$Date %in% dates_1972_1974,]
Input_linear_model_72 <- left_join(Input_linear_model_72, Cpi_rate[,c("Date","Flexible_CPI_.monthly.")])
Input_linear_model_72 <- left_join(Input_linear_model_72,Unemployment_rate)
Input_linear_model_72 <- left_join(Input_linear_model_72, Mortgage_rate_complete, by = "Date")
names(satfin_1972_1974_lminput)[names(satfin_1972_1974_lminput) == 'dates_1972_1974'] <- 'Date'
Input_linear_model_72 <- left_join(Input_linear_model_72, satfin_1972_1974_lminput, by = "Date")
reactable(Input_linear_model_72,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{5.5.2. Input data set for}}\) \(\color{darkblue}{\text{the 1977 – 1981 period}}\)
The input variables are: Fed Interest rate + CPI + Unemployment + Mortgage + Financial Satisfaction
Code
Input_linear_model_77 <- NULL
Input_linear_model_77 <- Interest_rate[Interest_rate$Date %in% dates_1977_1981,]
Input_linear_model_77 <- left_join(Input_linear_model_77, Cpi_rate[,c("Date","Flexible_CPI_.monthly.")])
Input_linear_model_77 <- left_join(Input_linear_model_77,Unemployment_rate)
Input_linear_model_77 <- left_join(Input_linear_model_77, Mortgage_rate_complete)
names(satfin_1977_1981_lminput)[names(satfin_1977_1981_lminput) == 'dates_1977_1981'] <- 'Date'
Input_linear_model_77 <- left_join(Input_linear_model_77, satfin_1977_1981_lminput, by = "Date")
reactable(Input_linear_model_77,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{5.5.3. Input data set for}}\) \(\color{darkblue}{\text{the 1993 – 1995 period}}\)
The input variables are: Fed Interest rate + CPI + Unemployment + Home Price Index + Mortgage + Job Security + Financial Satisfaction
Code
Input_linear_model_93 <- NULL
Input_linear_model_93 <- Interest_rate[Interest_rate$Date %in% dates_1993_1995,]
Input_linear_model_93 <- left_join(Input_linear_model_93, Cpi_rate[,c("Date","Flexible_CPI_.monthly.")])
Input_linear_model_93 <- left_join(Input_linear_model_93,Unemployment_rate)
Input_linear_model_93 <- left_join(Input_linear_model_93, Home_price_index)
Input_linear_model_93 <- left_join(Input_linear_model_93, Mortgage_rate_complete, by = "Date")
names(jsc_1993_1995_lminput)[names(jsc_1993_1995_lminput) == 'dates_1993_1995'] <- 'Date'
Input_linear_model_93 <- left_join(Input_linear_model_93, jsc_1993_1995_lminput, by = "Date")
names(satfin_1993_1995_lminput)[names(satfin_1993_1995_lminput) == 'dates_1993_1995'] <- 'Date'
Input_linear_model_93 <- left_join(Input_linear_model_93, satfin_1993_1995_lminput, by = "Date")
reactable(Input_linear_model_93,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{5.5.4. Input data set for}}\) \(\color{darkblue}{\text{the 2003 – 2007 period}}\)
The input variables are: Fed Interest rate + CPI + Unemployment + Home Price Index + Mortgage + Job Security + Financial Satisfaction + Income Alone is enough
Code
Input_linear_model_2003 <- NULL
Input_linear_model_2003 <- Interest_rate[Interest_rate$Date %in% dates_2003_2007,]
Input_linear_model_2003 <- left_join(Input_linear_model_2003, Cpi_rate[,c("Date","Flexible_CPI_.monthly.")])
Input_linear_model_2003 <- left_join(Input_linear_model_2003,Unemployment_rate)
Input_linear_model_2003 <- left_join(Input_linear_model_2003, Home_price_index)
Input_linear_model_2003 <- left_join(Input_linear_model_2003, Mortgage_rate_complete, by = "Date")
names(jsc_2003_2007_lminput)[names(jsc_2003_2007_lminput) == 'dates_2003_2007'] <- 'Date'
Input_linear_model_2003 <- left_join(Input_linear_model_2003, jsc_2003_2007_lminput, by = "Date")
names(satfin_2003_2007_lminput)[names(satfin_2003_2007_lminput) == 'dates_2003_2007'] <- 'Date'
Input_linear_model_2003 <- left_join(Input_linear_model_2003, satfin_2003_2007_lminput, by = "Date")
names(income_2003_2007_lminput)[names(income_2003_2007_lminput) == 'dates_2003_2007'] <- 'Date'
Input_linear_model_2003 <- left_join(Input_linear_model_2003, income_2003_2007_lminput, by = "Date")
reactable(Input_linear_model_2003,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{5.5.5. Input data set for}}\) \(\color{darkblue}{\text{the 2021 – 2023 period}}\)
The input variables are: Fed Interest rate + CPI + Unemployment + Home Price Index + Mortgage + Job Security + Financial Satisfaction + Income Alone is enough
Code
Input_linear_model_2021 <- NULL
Input_linear_model_2021 <- Interest_rate[Interest_rate$Date %in% dates_2021_2023,]
Input_linear_model_2021 <- left_join(Input_linear_model_2021, Cpi_rate[,c("Date","Flexible_CPI_.monthly.")])
Input_linear_model_2021$Flexible_CPI_.monthly.[3] <- Cpi_rate$Flexible_CPI_.monthly.[660]
Input_linear_model_2021 <- left_join(Input_linear_model_2021,Unemployment_rate)
Input_linear_model_2021 <- left_join(Input_linear_model_2021, Home_price_index)
Input_linear_model_2021 <- left_join(Input_linear_model_2021, Mortgage_rate_complete)
names(jsc_2021_2023_lminput)[names(jsc_2021_2023_lminput) == 'dates_2021_2023'] <- 'Date'
Input_linear_model_2021 <- left_join(Input_linear_model_2021, jsc_2021_2023_lminput, by = "Date")
names(satfin_2021_2023_lminput)[names(satfin_2021_2023_lminput) == 'dates_2021_2023'] <- 'Date'
Input_linear_model_2021 <- left_join(Input_linear_model_2021, satfin_2021_2023_lminput, by = "Date")
names(income_2021_2023_lminput)[names(income_2021_2023_lminput) == 'dates_2021_2023'] <- 'Date'
Input_linear_model_2021 <- left_join(Input_linear_model_2021, income_2021_2023_lminput, by = "Date")
reactable(Input_linear_model_2021,
sortable = TRUE,
searchable = TRUE)\(\color{darkblue}{\text{5.6. Correlations}}\)
In this section, the correlation between the input variables will be discussed. For this reason, the below tables were created.
Correlation of variables for the 1972 - 1974 period is presented in the below table.
Code
kable(cor(Input_linear_model_72[,-1]))| Effective_Rate | Flexible_CPI_.monthly. | Unemployment_rate | Average_Value | values_change_1972_1974_v1_satfin | values_change_1972_1974_v2_satfin | values_change_1972_1974_v3_satfin | |
|---|---|---|---|---|---|---|---|
| Effective_Rate | 1.000 | 0.241 | -0.243 | 0.913 | 0.937 | 0.937 | -0.937 |
| Flexible_CPI_.monthly. | 0.241 | 1.000 | -0.239 | 0.214 | 0.275 | 0.275 | -0.275 |
| Unemployment_rate | -0.243 | -0.239 | 1.000 | 0.061 | -0.067 | -0.067 | 0.067 |
| Average_Value | 0.913 | 0.214 | 0.061 | 1.000 | 0.934 | 0.934 | -0.934 |
| values_change_1972_1974_v1_satfin | 0.937 | 0.275 | -0.067 | 0.934 | 1.000 | 1.000 | -1.000 |
| values_change_1972_1974_v2_satfin | 0.937 | 0.275 | -0.067 | 0.934 | 1.000 | 1.000 | -1.000 |
| values_change_1972_1974_v3_satfin | -0.937 | -0.275 | 0.067 | -0.934 | -1.000 | -1.000 | 1.000 |
Correlation of variables for the 1977 - 1981 period is presented in the below table
Code
kable(cor(Input_linear_model_77[,-1]))| Effective_Rate | Flexible_CPI_.monthly. | Unemployment_rate | Average_Value | values_change_1977_1981_v1_satfin | values_change_1977_1981_v2_satfin | values_change_1977_1981_v3_satfin | |
|---|---|---|---|---|---|---|---|
| Effective_Rate | 1.000 | 0.497 | 0.078 | 0.915 | 0.855 | 0.855 | -0.855 |
| Flexible_CPI_.monthly. | 0.497 | 1.000 | -0.356 | 0.384 | 0.479 | 0.479 | -0.479 |
| Unemployment_rate | 0.078 | -0.356 | 1.000 | 0.346 | 0.265 | 0.265 | -0.265 |
| Average_Value | 0.915 | 0.384 | 0.346 | 1.000 | 0.918 | 0.918 | -0.918 |
| values_change_1977_1981_v1_satfin | 0.855 | 0.479 | 0.265 | 0.918 | 1.000 | 1.000 | -1.000 |
| values_change_1977_1981_v2_satfin | 0.855 | 0.479 | 0.265 | 0.918 | 1.000 | 1.000 | -1.000 |
| values_change_1977_1981_v3_satfin | -0.855 | -0.479 | -0.265 | -0.918 | -1.000 | -1.000 | 1.000 |
Correlation of variables for the 1993 - 1995 period is presented in the below table.
Code
kable(cor(Input_linear_model_93[,-1]))
#> Warning in cor(Input_linear_model_93[, -1]): the standard deviation
#> is zero| Effective_Rate | Flexible_CPI_.monthly. | Unemployment_rate | Home_Price_Index | Average_Value | values_change_1993_1995_v1_jsc | values_change_1993_1995_v2_jsc | values_change_1993_1995_v3_jsc | values_change_1993_1995_v4_jsc | values_change_1993_1995_v1_satfin | values_change_1993_1995_v2_satfin | values_change_1993_1995_v3_satfin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Effective_Rate | 1.000 | 0.022 | -0.960 | 0.979 | 0.744 | -0.968 | 0.968 | 0.968 | NA | 0.968 | -0.968 | 0.968 |
| Flexible_CPI_.monthly. | 0.022 | 1.000 | 0.073 | -0.031 | 0.009 | 0.040 | -0.040 | -0.040 | NA | -0.040 | 0.040 | -0.040 |
| Unemployment_rate | -0.960 | 0.073 | 1.000 | -0.959 | -0.841 | 0.949 | -0.949 | -0.949 | NA | -0.949 | 0.949 | -0.949 |
| Home_Price_Index | 0.979 | -0.031 | -0.959 | 1.000 | 0.802 | -0.997 | 0.997 | 0.997 | NA | 0.997 | -0.997 | 0.997 |
| Average_Value | 0.744 | 0.009 | -0.841 | 0.802 | 1.000 | -0.809 | 0.809 | 0.809 | NA | 0.809 | -0.809 | 0.809 |
| values_change_1993_1995_v1_jsc | -0.968 | 0.040 | 0.949 | -0.997 | -0.809 | 1.000 | -1.000 | -1.000 | NA | -1.000 | 1.000 | -1.000 |
| values_change_1993_1995_v2_jsc | 0.968 | -0.040 | -0.949 | 0.997 | 0.809 | -1.000 | 1.000 | 1.000 | NA | 1.000 | -1.000 | 1.000 |
| values_change_1993_1995_v3_jsc | 0.968 | -0.040 | -0.949 | 0.997 | 0.809 | -1.000 | 1.000 | 1.000 | NA | 1.000 | -1.000 | 1.000 |
| values_change_1993_1995_v4_jsc | NA | NA | NA | NA | NA | NA | NA | NA | 1 | NA | NA | NA |
| values_change_1993_1995_v1_satfin | 0.968 | -0.040 | -0.949 | 0.997 | 0.809 | -1.000 | 1.000 | 1.000 | NA | 1.000 | -1.000 | 1.000 |
| values_change_1993_1995_v2_satfin | -0.968 | 0.040 | 0.949 | -0.997 | -0.809 | 1.000 | -1.000 | -1.000 | NA | -1.000 | 1.000 | -1.000 |
| values_change_1993_1995_v3_satfin | 0.968 | -0.040 | -0.949 | 0.997 | 0.809 | -1.000 | 1.000 | 1.000 | NA | 1.000 | -1.000 | 1.000 |
Correlation of variables for the 2003 - 2007 period is presented in the below table.
Code
kable(cor(Input_linear_model_2003[,-1]))| Effective_Rate | Flexible_CPI_.monthly. | Unemployment_rate | Home_Price_Index | Average_Value | values_change_2003_2007_v1_jsc | values_change_2003_2007_v2_jsc | values_change_2003_2007_v3_jsc | values_change_2003_2007_v4_jsc | values_change_2003_2007_v1_satfin | values_change_2003_2007_v2_satfin | values_change_2003_2007_v3_satfin | Income_yes_responds | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Effective_Rate | 1.000 | -0.077 | -0.966 | 0.968 | 0.659 | -0.979 | -0.979 | -0.979 | 0.979 | -0.925 | 0.925 | 0.925 | -0.985 |
| Flexible_CPI_.monthly. | -0.077 | 1.000 | 0.062 | -0.019 | -0.177 | 0.054 | 0.054 | 0.054 | -0.054 | 0.014 | -0.014 | -0.014 | 0.070 |
| Unemployment_rate | -0.966 | 0.062 | 1.000 | -0.968 | -0.583 | 0.983 | 0.983 | 0.983 | -0.983 | 0.959 | -0.959 | -0.959 | 0.982 |
| Home_Price_Index | 0.968 | -0.019 | -0.968 | 1.000 | 0.568 | -0.983 | -0.983 | -0.983 | 0.983 | -0.978 | 0.978 | 0.978 | -0.972 |
| Average_Value | 0.659 | -0.177 | -0.583 | 0.568 | 1.000 | -0.588 | -0.588 | -0.588 | 0.588 | -0.491 | 0.491 | 0.491 | -0.614 |
| values_change_2003_2007_v1_jsc | -0.979 | 0.054 | 0.983 | -0.983 | -0.588 | 1.000 | 1.000 | 1.000 | -1.000 | 0.978 | -0.978 | -0.978 | 0.997 |
| values_change_2003_2007_v2_jsc | -0.979 | 0.054 | 0.983 | -0.983 | -0.588 | 1.000 | 1.000 | 1.000 | -1.000 | 0.978 | -0.978 | -0.978 | 0.997 |
| values_change_2003_2007_v3_jsc | -0.979 | 0.054 | 0.983 | -0.983 | -0.588 | 1.000 | 1.000 | 1.000 | -1.000 | 0.978 | -0.978 | -0.978 | 0.997 |
| values_change_2003_2007_v4_jsc | 0.979 | -0.054 | -0.983 | 0.983 | 0.588 | -1.000 | -1.000 | -1.000 | 1.000 | -0.978 | 0.978 | 0.978 | -0.997 |
| values_change_2003_2007_v1_satfin | -0.925 | 0.014 | 0.959 | -0.978 | -0.491 | 0.978 | 0.978 | 0.978 | -0.978 | 1.000 | -1.000 | -1.000 | 0.961 |
| values_change_2003_2007_v2_satfin | 0.925 | -0.014 | -0.959 | 0.978 | 0.491 | -0.978 | -0.978 | -0.978 | 0.978 | -1.000 | 1.000 | 1.000 | -0.961 |
| values_change_2003_2007_v3_satfin | 0.925 | -0.014 | -0.959 | 0.978 | 0.491 | -0.978 | -0.978 | -0.978 | 0.978 | -1.000 | 1.000 | 1.000 | -0.961 |
| Income_yes_responds | -0.985 | 0.070 | 0.982 | -0.972 | -0.614 | 0.997 | 0.997 | 0.997 | -0.997 | 0.961 | -0.961 | -0.961 | 1.000 |
Correlation of variables for the 2021 - 2023 period is presented in the below table.
Code
kable(cor(Input_linear_model_2021[,-1]))
#> Warning in cor(Input_linear_model_2021[, -1]): the standard
#> deviation is zero| Effective_Rate | Flexible_CPI_.monthly. | Unemployment_rate | Home_Price_Index | Average_Value | values_change_2021_2023_v1_jsc | values_change_2021_2023_v2_jsc | values_change_2021_2023_v3_jsc | values_change_2021_2023_v4_jsc | values_change_2021_2023_v1_satfin | values_change_2021_2023_v2_satfin | values_change_2021_2023_v3_satfin | Income_yes_responds | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Effective_Rate | 1.000 | -0.568 | -0.548 | NA | 0.896 | -0.974 | 0.974 | 0.974 | NA | 0.973 | 0.973 | -0.973 | 0.974 |
| Flexible_CPI_.monthly. | -0.568 | 1.000 | 0.508 | NA | -0.571 | 0.522 | -0.522 | -0.522 | NA | -0.513 | -0.513 | 0.513 | -0.547 |
| Unemployment_rate | -0.548 | 0.508 | 1.000 | NA | -0.720 | 0.601 | -0.601 | -0.601 | NA | -0.587 | -0.587 | 0.587 | -0.641 |
| Home_Price_Index | NA | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| Average_Value | 0.896 | -0.571 | -0.720 | NA | 1.000 | -0.908 | 0.908 | 0.908 | NA | 0.901 | 0.901 | -0.901 | 0.928 |
| values_change_2021_2023_v1_jsc | -0.974 | 0.522 | 0.601 | NA | -0.908 | 1.000 | -1.000 | -1.000 | NA | -1.000 | -1.000 | 1.000 | -0.998 |
| values_change_2021_2023_v2_jsc | 0.974 | -0.522 | -0.601 | NA | 0.908 | -1.000 | 1.000 | 1.000 | NA | 1.000 | 1.000 | -1.000 | 0.998 |
| values_change_2021_2023_v3_jsc | 0.974 | -0.522 | -0.601 | NA | 0.908 | -1.000 | 1.000 | 1.000 | NA | 1.000 | 1.000 | -1.000 | 0.998 |
| values_change_2021_2023_v4_jsc | NA | NA | NA | NA | NA | NA | NA | NA | 1 | NA | NA | NA | NA |
| values_change_2021_2023_v1_satfin | 0.973 | -0.513 | -0.587 | NA | 0.901 | -1.000 | 1.000 | 1.000 | NA | 1.000 | 1.000 | -1.000 | 0.996 |
| values_change_2021_2023_v2_satfin | 0.973 | -0.513 | -0.587 | NA | 0.901 | -1.000 | 1.000 | 1.000 | NA | 1.000 | 1.000 | -1.000 | 0.996 |
| values_change_2021_2023_v3_satfin | -0.973 | 0.513 | 0.587 | NA | -0.901 | 1.000 | -1.000 | -1.000 | NA | -1.000 | -1.000 | 1.000 | -0.996 |
| Income_yes_responds | 0.974 | -0.547 | -0.641 | NA | 0.928 | -0.998 | 0.998 | 0.998 | NA | 0.996 | 0.996 | -0.996 | 1.000 |
As expected, the Fed Interest rate is correlated with almost every variable because it can be claimed that the Fed Interest rate is the main driver of these variables. However, in this study, the changing effect of interest rates and the changing behaviors of generation will be discussed. By taking into consideration the fact that there is less data for some periods compared to others, which can affect the correlation values, a general assumption can be made. It should be noted that, in each period, Fed Interest rate is increasing. In addition, there is a strong correlation between the survey answers spotted. Therefore, only one of the answer values will be taken into consideration.
For example, as expressed in the tables, the correlation between interest rates and CPI values is constantly decreasing over the periods. Even only considering this fact, it can be stated that the perspective of Fed is changing. In the past, an increase of the interest rate was performed only when the CPI was increasing, however, today, an increasing of the interest rate is performed although the decrease in CPI is being observed.
Secondly, the correlation between interest rates and unemployment is almost constant for the following periods: 1993-1995, and 2003-2007. On the other hand, different correlation values can be observed for other periods. For example, for the 1977-1981 period, the correlation is positive. Although it can be stated that interest rates have a delayed negative affect on unemployment rate, current unemployment rate can affect peoples’ decision about buying a new home or saving money.
Thirdly, due to the fact that the logic behind all the survey results is the same, the correlation between the survey results is almost 1 or -1. Therefore, it was decided that only one survey result would be used for each linear regression model.
By considering these statements and correlation values, it is decided to use all the input variables if applicable for all the linear models. The analyses of the linear models will be performed in next section.
\(\color{darkblue}{\text{6. Analysis}}\)
#> Warning: package 'lmtest' was built under R version 4.3.2
In this section, the results of the linear regression analyses performed will be presented. The input data for the models have already been created in the previous sections of this report. However, in order to equalize the effects of all variables, the input data should be scaled. For this reason, the scale_0_to_1 function was created.
Code
scale_0_to_1 <- function(x) {
(x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}\(\color{darkblue}{\text{6.1. Research question:}}\) \(\color{darkblue}{\text{Home ownership and}}\) \(\color{darkblue}{\text{differences}}\) \(\color{darkblue}{\text{of behavior between}}\) \(\color{darkblue}{\text{generations}}\)
The following section examinates the relationship between the interest rates and the home ownership on one hand and on the other hand, it investigates on the differences in behaviors between generations in the United States. The initial step involved the development of a linear model for all the periods that have been previously selected.
\(\color{darkblue}{\text{6.1.1. For the 1993-1995}}\) \(\color{darkblue}{\text{period}}\)
The input data is scaled by using the scale_0_to_1 function that was previously created.
Code
Q1_home_model_93_input <- Input_linear_model_93
Q1_home_model_93_input_scaled_data <- NULL
Q1_home_model_93_input_scaled_data <- as.data.frame(lapply(Q1_home_model_93_input[-1], scale_0_to_1))For this linear regression model, only the Job Security survey data was used. Consequently, the final input variables consist of the Federal Interest rate, CPI, Unemployment rate, Home Price index, Mortgage rate, and Job Security survey answer 1. Job Security survey answer 1 represents the answers of “my job security is high”. Following the scaling of the data, the Home Ownership data was added into the data frame, leading to the creation of a linear regression model. The outcome of the linear regression can be observed in the “Code” section that appears in a pop-up window.
Code
Q1_home_model_93_input_scaled_data$home_ownership <- Home_ownership_1993_1995$Home_Ownership_Rate
Q1_home_model_93_input_scaled_data$values_change_1993_1995_v4_jsc <- Q1_home_model_93_input$values_change_1993_1995_v4_jsc
Q1_home_model_93 <- lm( home_ownership ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Home_Price_Index + Average_Value + values_change_1993_1995_v1_jsc , data = Q1_home_model_93_input_scaled_data)
# satisfaction survey dominates the model, and for 2003 analysis, satisfaction survey has no power. Thus, for better results satisfaction survey was not considered.
summary(Q1_home_model_93)
#>
#> Call:
#> lm(formula = home_ownership ~ Effective_Rate + Flexible_CPI_.monthly. +
#> Unemployment_rate + Home_Price_Index + Average_Value + values_change_1993_1995_v1_jsc,
#> data = Q1_home_model_93_input_scaled_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.20147 -0.04158 0.00567 0.03862 0.14645
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 63.453 1.392 45.59 9.9e-16
#> Effective_Rate 1.998 0.494 4.04 0.0014
#> Flexible_CPI_.monthly. -0.222 0.104 -2.14 0.0519
#> Unemployment_rate 0.725 0.420 1.73 0.1081
#> Home_Price_Index -0.623 1.448 -0.43 0.6738
#> Average_Value -0.387 0.188 -2.06 0.0603
#> values_change_1993_1995_v1_jsc 0.144 1.263 0.11 0.9111
#>
#> (Intercept) ***
#> Effective_Rate **
#> Flexible_CPI_.monthly. .
#> Unemployment_rate
#> Home_Price_Index
#> Average_Value .
#> values_change_1993_1995_v1_jsc
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.101 on 13 degrees of freedom
#> Multiple R-squared: 0.893, Adjusted R-squared: 0.844
#> F-statistic: 18.2 on 6 and 13 DF, p-value: 1.25e-05
Q1_home_model_93_2 <- Q1_home_model_93\(\color{darkblue}{\text{6.1.2. For the 2003-2007}}\) \(\color{darkblue}{\text{period}}\)
The input data is scaled by using the scale_0_to_1 function created previously.
Code
Q1_home_model_2003_input <- Input_linear_model_2003
Q1_home_model_2003_input_scaled_data <- NULL
Q1_home_model_2003_input_scaled_data <- as.data.frame(lapply(Q1_home_model_2003_input[-1], scale_0_to_1))After scaling the data, a linear regression model was established. The results of the linear regression can be observed in the “Code” pop-up section. The input variables consist of the Federal Interest rate, CPI, Unemployment rate, Home Price Index, Mortgage rate and finally the Job Security survey.
Code
Q1_home_model_2003_input_scaled_data$home_ownership <- Home_ownership_2003_2007$Home_Ownership_Rate
Q1_home_model_2003 <- lm( home_ownership ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Home_Price_Index + Average_Value + values_change_2003_2007_v1_jsc , data = Q1_home_model_2003_input_scaled_data)
summary(Q1_home_model_2003)
#>
#> Call:
#> lm(formula = home_ownership ~ Effective_Rate + Flexible_CPI_.monthly. +
#> Unemployment_rate + Home_Price_Index + Average_Value + values_change_2003_2007_v1_jsc,
#> data = Q1_home_model_2003_input_scaled_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.4700 -0.0956 0.0090 0.0719 0.3922
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 71.3219 0.5785 123.28 < 2e-16
#> Effective_Rate -2.9279 0.4178 -7.01 4.4e-08
#> Flexible_CPI_.monthly. -0.0503 0.1629 -0.31 0.7593
#> Unemployment_rate 0.6663 0.5568 1.20 0.2398
#> Home_Price_Index 0.1780 0.4543 0.39 0.6976
#> Average_Value 0.5708 0.1602 3.56 0.0011
#> values_change_2003_2007_v1_jsc -3.7781 0.7769 -4.86 2.6e-05
#>
#> (Intercept) ***
#> Effective_Rate ***
#> Flexible_CPI_.monthly.
#> Unemployment_rate
#> Home_Price_Index
#> Average_Value **
#> values_change_2003_2007_v1_jsc ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.174 on 34 degrees of freedom
#> Multiple R-squared: 0.616, Adjusted R-squared: 0.548
#> F-statistic: 9.08 on 6 and 34 DF, p-value: 6.01e-06The input variables are therefore the following: Fed effective rate, CPI, Unemployment, Home price, Mortgage rate, Job Security survey answer 1. The coefficients of the linear regression model are displayed in the table below.
Code
Q1_home_model_2003_2 <- lm( home_ownership ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Home_Price_Index + Average_Value + values_change_2003_2007_v1_jsc , data = Q1_home_model_2003_input_scaled_data)
summary(Q1_home_model_2003_2)
#>
#> Call:
#> lm(formula = home_ownership ~ Effective_Rate + Flexible_CPI_.monthly. +
#> Unemployment_rate + Home_Price_Index + Average_Value + values_change_2003_2007_v1_jsc,
#> data = Q1_home_model_2003_input_scaled_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.4700 -0.0956 0.0090 0.0719 0.3922
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 71.3219 0.5785 123.28 < 2e-16
#> Effective_Rate -2.9279 0.4178 -7.01 4.4e-08
#> Flexible_CPI_.monthly. -0.0503 0.1629 -0.31 0.7593
#> Unemployment_rate 0.6663 0.5568 1.20 0.2398
#> Home_Price_Index 0.1780 0.4543 0.39 0.6976
#> Average_Value 0.5708 0.1602 3.56 0.0011
#> values_change_2003_2007_v1_jsc -3.7781 0.7769 -4.86 2.6e-05
#>
#> (Intercept) ***
#> Effective_Rate ***
#> Flexible_CPI_.monthly.
#> Unemployment_rate
#> Home_Price_Index
#> Average_Value **
#> values_change_2003_2007_v1_jsc ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.174 on 34 degrees of freedom
#> Multiple R-squared: 0.616, Adjusted R-squared: 0.548
#> F-statistic: 9.08 on 6 and 34 DF, p-value: 6.01e-06
kable(Q1_home_model_2003_2$coefficients)| x | |
|---|---|
| (Intercept) | 71.322 |
| Effective_Rate | -2.928 |
| Flexible_CPI_.monthly. | -0.050 |
| Unemployment_rate | 0.666 |
| Home_Price_Index | 0.178 |
| Average_Value | 0.571 |
| values_change_2003_2007_v1_jsc | -3.778 |
\(\color{darkblue}{\text{6.1.3. For the 2021-2023}}\) \(\color{darkblue}{\text{period}}\)
The input data is scaled by using the scale_0_to_1 function created previously.
Code
Input_linear_model_2021_2 <- Input_linear_model_2021[1:19,]
Q1_home_model_2021_input <- Input_linear_model_2021_2
Q1_home_model_2021_input_scaled_data <- NULL
Q1_home_model_2021_input_scaled_data <- as.data.frame(lapply(Q1_home_model_2021_input[-1], scale_0_to_1))
Q1_home_model_2021_input_scaled_data$values_change_2021_2023_v4_jsc <- Input_linear_model_2021_2$values_change_2021_2023_v4_jscAs it has been the case in the previous periods, we are carrying out the same steps, i.e. scaling the data, adding Home Ownership to the data frame and thus creating a linear model. The input variables are Fed Interest rate + CPI + Unemployment rate + Home Price Index + Mortgage rate + Job Security survey.
Code
Q1_home_model_2021_input_scaled_data$home_ownership <- Home_ownership_2021_2023$Home_Ownership_Rate
Q1_home_model_2021 <- lm( home_ownership ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate+ Home_Price_Index + Average_Value + values_change_2021_2023_v1_jsc, data = Q1_home_model_2021_input_scaled_data)
summary(Q1_home_model_2021)
#>
#> Call:
#> lm(formula = home_ownership ~ Effective_Rate + Flexible_CPI_.monthly. +
#> Unemployment_rate + Home_Price_Index + Average_Value + values_change_2021_2023_v1_jsc,
#> data = Q1_home_model_2021_input_scaled_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.1846 -0.0778 0.0415 0.0805 0.1416
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 64.5481 0.9206 70.12 <2e-16
#> Effective_Rate 0.7924 0.7670 1.03 0.322
#> Flexible_CPI_.monthly. -0.1156 0.1194 -0.97 0.352
#> Unemployment_rate 0.2890 0.3263 0.89 0.393
#> Home_Price_Index 0.7577 0.3925 1.93 0.078
#> Average_Value 0.0227 0.3371 0.07 0.947
#> values_change_2021_2023_v1_jsc 0.7070 0.9930 0.71 0.490
#>
#> (Intercept) ***
#> Effective_Rate
#> Flexible_CPI_.monthly.
#> Unemployment_rate
#> Home_Price_Index .
#> Average_Value
#> values_change_2021_2023_v1_jsc
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.122 on 12 degrees of freedom
#> Multiple R-squared: 0.836, Adjusted R-squared: 0.754
#> F-statistic: 10.2 on 6 and 12 DF, p-value: 4e-04Input variables therefore are Fed effective rate + CPI + Unemployment+ Home Price + Mortgage rate + Job Security survey answer 1. The result coefficients of the linear regression model are presented in the below table.
Code
Q1_home_model_2021_2 <- lm( home_ownership ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Home_Price_Index + Average_Value + values_change_2021_2023_v1_jsc , data = Q1_home_model_2021_input_scaled_data)
summary(Q1_home_model_2021_2)
#>
#> Call:
#> lm(formula = home_ownership ~ Effective_Rate + Flexible_CPI_.monthly. +
#> Unemployment_rate + Home_Price_Index + Average_Value + values_change_2021_2023_v1_jsc,
#> data = Q1_home_model_2021_input_scaled_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.1846 -0.0778 0.0415 0.0805 0.1416
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 64.5481 0.9206 70.12 <2e-16
#> Effective_Rate 0.7924 0.7670 1.03 0.322
#> Flexible_CPI_.monthly. -0.1156 0.1194 -0.97 0.352
#> Unemployment_rate 0.2890 0.3263 0.89 0.393
#> Home_Price_Index 0.7577 0.3925 1.93 0.078
#> Average_Value 0.0227 0.3371 0.07 0.947
#> values_change_2021_2023_v1_jsc 0.7070 0.9930 0.71 0.490
#>
#> (Intercept) ***
#> Effective_Rate
#> Flexible_CPI_.monthly.
#> Unemployment_rate
#> Home_Price_Index .
#> Average_Value
#> values_change_2021_2023_v1_jsc
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.122 on 12 degrees of freedom
#> Multiple R-squared: 0.836, Adjusted R-squared: 0.754
#> F-statistic: 10.2 on 6 and 12 DF, p-value: 4e-04
kable(Q1_home_model_2021_2$coefficients)| x | |
|---|---|
| (Intercept) | 64.548 |
| Effective_Rate | 0.792 |
| Flexible_CPI_.monthly. | -0.116 |
| Unemployment_rate | 0.289 |
| Home_Price_Index | 0.758 |
| Average_Value | 0.023 |
| values_change_2021_2023_v1_jsc | 0.707 |
For presentation purposes, a new data frame was created which includes the coefficients of each linear model as presented below.
Code
home_ownership_coef <- data.frame(Q1_home_model_2021_2$coefficients)
homecoef_2003 <- data.frame(Q1_home_model_2003_2$coefficients)
home_ownership_coef$year2003 <- homecoef_2003$Q1_home_model_2003_2.coefficients
homecoef_1993 <- data.frame(Q1_home_model_93_2$coefficients)
#homecoef_1993['Income_respond',] <- 0
home_ownership_coef$year_1993 <- homecoef_1993$Q1_home_model_93_2.coefficients
new_column_names_home_coef <- c("2021", "2003", "1993")
colnames(home_ownership_coef) <- new_column_names_home_coef
df_transposed <- t(home_ownership_coef)
df_long <- as.data.frame(df_transposed, check.names = FALSE)
df_long$Variable <- rownames(df_transposed)
coef_home_table <- df_long
coef_home_table <- t(coef_home_table)
kable(coef_home_table)| 2021 | 2003 | 1993 | |
|---|---|---|---|
| (Intercept) | 64.5 | 71.3 | 63.5 |
| Effective_Rate | 0.792 | -2.928 | 1.998 |
| Flexible_CPI_.monthly. | -0.1156 | -0.0503 | -0.2218 |
| Unemployment_rate | 0.289 | 0.666 | 0.725 |
| Home_Price_Index | 0.758 | 0.178 | -0.623 |
| Average_Value | 0.0227 | 0.5708 | -0.3865 |
| values_change_2021_2023_v1_jsc | 0.707 | -3.778 | 0.144 |
| Variable | 2021 | 2003 | 1993 |
Code
# Melt the data frame to long format for plotting
df_long <- reshape2::melt(df_long, id.vars = "Variable")\(\color{darkblue}{\text{6.1.4. Fed Interest rate}}\)
The bar charts below displays the coefficients of the Fed Interest rate from each linear model performed.
Code
home_effect_rate_coef <- df_long[4:6,]
plot_ly(data = home_effect_rate_coef, x = ~Variable, y = ~value, type = 'bar', color = ~Variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value)) |>
layout(title = "Interest Rate Affect Across Years",
xaxis = list(title = "Years"),
yaxis = list(title = "Weights"))
The bar charts illustrate the fluctuation of the interest rate coefficients across different time periods. Although the idea that states interest rate has a delayed effect on home ownership can be endorsed, it can also be stated that the interest rates are not the main driver of home ownership rates. Furthermore, since there is only available data for three periods, it is not possible to make a definitive statement regarding the overall trend of this correlation. However, based on these three data, it can be assumed that the effect of the interest rates is decreasing.
\(\color{darkblue}{\text{6.1.5. Comparision of}}\) \(\color{darkblue}{\text{Coefficients}}\)
Code
home_unemp_coef <- df_long[10:12,]
chart_unemp_q1home <- plot_ly(data = home_unemp_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
home_income_coef <- df_long[22:24,]
# <- plot_ly(data = home_income_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
# text = ~paste(variable, ": ", value))
home_jbsc_coef <- df_long[19:21,]
chart_jbsc_q1home <- plot_ly(data = home_jbsc_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
home_homeprice_coef <- df_long[13:15,]
chart_homeprice_q1home <- plot_ly(data = home_homeprice_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
subplot(chart_unemp_q1home, chart_homeprice_q1home,chart_jbsc_q1home, nrows = 2) |> layout(title = "Coefficients Across Years",legend = list(orientation = "h", x = 0.5, y = -0.15))
If individuals who assert that job security is high are regarded as those employed in white-collar occupations, they can be categorized as middle and upper class. Contrarily, it can be claimed that the unemployment rate is more closely related with jobs in the middle and lower classes. When considering the number of individuals belonging to these classes, we can argue that the middle-low class has a significantly greater impact on home ownership rates. These bar charts can be analyzed across three periods.
For the period starts from 1993, the bar charts indicate a consistent decrease in the coefficient of unemployment rate. In 1993, despite the increase in unemployment, there was a tendency among people to purchase homes. This can be seen as a logical outcome, as individuals within the working group often seek to secure their financial stability or accommodation needs by investing in real estate (buying home) when faced with rising unemployment rates. The rationale behind this behavior is that an increase in the unemployment rate can serve as a signal for imminent economic difficulties, which in turn can impact individuals’ financial situations. On the other hand, it also suggests that people begin considering buying homes when economic conditions deteriorate. Consequently, it can be argued that individuals in 1993 were optimistic about the future if economic conditions remained favorable, leading them to believe that they would encounter minimal problems in the future. This statement is further supported by the responses obtained from a job security survey. The graph only presents the answer “job security is good,” which appears to have almost no effect on the home ownership rate. Therefore, it can be concluded that whether individuals have a secure job or not does not significantly influence their decision to purchase a home.
For the period starts from 2003, noticeable changes have occurred. The initial change is evident in the unemployment rate, with the coefficient decreasing from 0.724 to 0.666. This indicates that the low-middle class was attempting to purchase homes by considering less the economic conditions. Additionally, the coefficient of the Home Price Index increased in 2003, implying that the rate of home ownership was increasing despite the rise in home prices. Taking these two factors into consideration, it can be argued that the middle-low class was trying to buy homes if they were able to do so. Consequently, there is a distinct shift in behavior compared to that of 1993. On the other hand, by examining the job security survey, the behavior of the middle-upper class can be analyzed. The results reveal a relatively strong negative correlation. This suggests that if individuals believed their income was sufficient and they were not at risk of losing their jobs, they were not temped to purchase a house. They may have chosen to wait for a decrease in house prices. However, it is important to note that a negative correlation does not necessarily mean that they sold their houses. This is because the low-middle class primarily controls the home ownership rate, and even a slight decrease in ownership within this class can have a significant impact on the middle-upper class. Therefore, a clear change in behavior can be observed among the middle-upper class.
For the period starts from 2021, noticeable changes have been observed, particularly in the unemployment rate. The coefficient for unemployment rate has decreased from 0.666 to 0.289, which has less impact on home ownership. It can be stated that the ability of the middle-lower class to purchase homes has decreased, leading to a significant change in their behavior. On the other hand, there has been an increase in the coefficient for surveys, indicating that those who believe their income is sufficient and are less likely to lose their jobs are more inclined to purchase homes. This suggests a more pessimistic outlook for the future.
As a result, a significant shift in people’s behavior can be observed through generations. In the past, when the economic situation deteriorated, the middle-low class had the opportunity to purchase homes, but now they have lost that ability. On the contrary, the middle-upper class has become more pessimistic and prioritizes securing their accommodation more than ever before. Many factors influence these behaviors, and conducting extensive research is necessary to gain a comprehensive understanding of them. Nevertheless, even with this study, the evident change in behavior remains apparent.
\(\color{darkblue}{\text{6.2. Research question:}}\) \(\color{darkblue}{\text{Personal saving and}}\) \(\color{darkblue}{\text{differences}}\) \(\color{darkblue}{\text{of behavior between}}\) \(\color{darkblue}{\text{generations}}\)
The input variables for all the models are the following: Fed interest rate + CPI + Unemployment + Mortgage rate + Financial satisfaction survey answer 1. The coefficients are displayed in the table below.
Code
#For period 1972 - 1974
Q1_saving_model_1972_input <- Input_linear_model_72
Q1_saving_model_1972_input_scaled_data <- NULL
Q1_saving_model_1972_input_scaled_data <- as.data.frame(lapply(Q1_saving_model_1972_input[-1], scale_0_to_1))
Q1_saving_model_1972_input_scaled_data$personal_saving <- Personal_saving_rate_1972_1974$Rate
Q1_saving_model_1972 <- lm( personal_saving ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_1972_1974_v1_satfin, data = Q1_saving_model_1972_input_scaled_data)
#For period 1977 - 1981
Q1_saving_model_1977_input <- Input_linear_model_77
Q1_saving_model_1977_input_scaled_data <- NULL
Q1_saving_model_1977_input_scaled_data <- as.data.frame(lapply(Q1_saving_model_1977_input[-1], scale_0_to_1))
Q1_saving_model_1977_input_scaled_data$personal_saving <- Personal_saving_rate_1977_1981$Rate
Q1_saving_model_1977 <- lm( personal_saving ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_1977_1981_v1_satfin, data = Q1_saving_model_1977_input_scaled_data)
#For period 1993 - 1995
Q1_saving_model_1993_input <- Input_linear_model_93
Q1_saving_model_1993_input_scaled_data <- NULL
Q1_saving_model_1993_input_scaled_data <- as.data.frame(lapply(Q1_saving_model_1993_input[-1], scale_0_to_1))
Q1_saving_model_1993_input_scaled_data$personal_saving <- Personal_saving_rate_1993_1995$Rate
Q1_saving_model_1993 <- lm( personal_saving ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_1993_1995_v1_satfin, data = Q1_saving_model_1993_input_scaled_data)
#For period 2003 - 2007
Q1_saving_model_2003_input <- Input_linear_model_2003
Q1_saving_model_2003_input_scaled_data <- NULL
Q1_saving_model_2003_input_scaled_data <- as.data.frame(lapply(Q1_saving_model_2003_input[-1], scale_0_to_1))
Q1_saving_model_2003_input_scaled_data$personal_saving <- Personal_saving_rate_2003_2007$Rate
Q1_saving_model_2003 <- lm( personal_saving ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_2003_2007_v1_satfin, data = Q1_saving_model_2003_input_scaled_data)
#For period 2021 - 2023
Q1_saving_model_2021_input <- Input_linear_model_2021
Q1_saving_model_2021_input_scaled_data <- NULL
Q1_saving_model_2021_input_scaled_data <- as.data.frame(lapply(Q1_saving_model_2021_input[-1], scale_0_to_1))
Q1_saving_model_2021_input_scaled_data$personal_saving <- Personal_saving_rate_2021_2023$Rate
Q1_saving_model_2021 <- lm( personal_saving ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_2021_2023_v1_satfin, data = Q1_saving_model_2021_input_scaled_data)
personal_saving_coef <- data.frame(Q1_saving_model_2021$coefficients)
personal_saving_coef$year2003 <- Q1_saving_model_2003$coefficients
personal_saving_coef$year1993 <- Q1_saving_model_1993$coefficients
personal_saving_coef$year1977 <- Q1_saving_model_1977$coefficients
personal_saving_coef$year1972 <- Q1_saving_model_1972$coefficients
new_column_names_saving_coef <- c("2021", "2003", "1993", "1977", "1972")
colnames(personal_saving_coef) <- new_column_names_saving_coef
kable(personal_saving_coef)| 2021 | 2003 | 1993 | 1977 | 1972 | |
|---|---|---|---|---|---|
| (Intercept) | 5.002 | -0.491 | 8.638 | 9.15 | 14.541 |
| Effective_Rate | 5.882 | 3.293 | -3.588 | 1.75 | -3.722 |
| Flexible_CPI_.monthly. | 0.269 | 0.055 | -0.095 | 0.47 | -1.679 |
| Unemployment_rate | 1.016 | 1.647 | -1.799 | 1.65 | -0.968 |
| Average_Value | -5.518 | -0.612 | -0.910 | -2.73 | 0.595 |
| values_change_2021_2023_v1_satfin | -1.497 | 3.446 | 2.046 | 2.32 | 3.126 |
\(\color{darkblue}{\text{6.2.1. Comparison of}}\) \(\color{darkblue}{\text{Coefficients}}\)
The coefficients of each year can be seen in below table.
Code
df_transposed_saving <- t(personal_saving_coef)
df_long_s <- as.data.frame(df_transposed_saving, check.names = FALSE)
df_long_s$Variable <- rownames(df_transposed_saving)
# Melt the data frame to long format for plotting
df_long_s <- reshape2::melt(df_long_s, id.vars = "Variable")
saving_mort_coef <- df_long_s[21:25,]
chart_mort_q1saving <- plot_ly(data = saving_mort_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
saving_satfin_coef <- df_long_s[26:30,]
chart_satfin_q1saving <- plot_ly(data = saving_satfin_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
saving_fed_coef <- df_long_s[6:10,]
chart_fed_q1saving <- plot_ly(data = saving_fed_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
saving_cpi_coef <- df_long_s[11:15,]
chart_cpi_q1saving <- plot_ly(data = saving_cpi_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
subplot(chart_fed_q1saving, chart_cpi_q1saving,chart_satfin_q1saving , nrows = 2) |> layout(title = "Coefficients Across Years",legend = list(orientation = "h", x = 0.5, y = -0.15))
Federal interest rate and savings: The analysis initially revealed an expected negative correlation between federal interest rates and savings behavior. This alignment follows economic theories, as higher interest rates usually discourage saving due to the increased cost of borrowing and reduced return on savings. However, the trend seems to turn positive in 2003, and an intriguing increase in this correlation was noted in 2021. This shift could be because of the government interventions in response to the COVID-19 pandemic. The sudden surge in savings might directly link to the financial assistance provided, causing a momentary increase. However, the correlation analysis suggests that such abrupt changes might have limited long-term impact. This could prompt individuals to reconsider their saving habits, potentially leading them to save more as a precautionary measure against uncertain economic times.
Satisfaction with income and savings: A persuasive trend emerged concerning the relationship between the individuals’ satisfaction with their income and their saving behavior. The data illustrated a clear association where increased satisfaction with income aligned with a stronger inclination to save money. This observation explores the psychological nuances of saving, which suggests that one’s contentment with income significantly guides decisions related to saving money. This link between satisfaction and increased saving underscores the impact of emotional well-being on financial choices. It implies that individuals who perceive themselves as financially secure or content tend to prioritize saving for the future. This inclination reflects not only a sense of stability but also confidence in their financial situation, indicating a proactive approach towards securing their financial well-being. Moreover, this finding suggests that beyond the quantitative aspects of income, the emotional aspect plays a pivotal role in shaping saving behaviors. It signifies that feelings of contentment and financial security serve as drivers for prudent financial habits. The connection between satisfaction with income and increased saving illuminates the complex relationship between emotional states, perceptions, and the consequent financial actions, emphasizing the influential role of psychological factors in financial decision-making. Conversely, the negative correlation can be observed in 2021. This negative correlation may suggest the behavioral change of people. Two reason can be provided for this change. First one is that people who said their income is good, spend more and save less. Second reason is that even people who said their income is not good, save more money and due to percentage difference this behavior reflected as negative correlation. If the interest rate correlation is considered, second reason is more likely to happen.
Consumer Price Index (CPI) and savings: Conventional economic wisdom would anticipate a negative correlation between the Consumer Price Index (CPI) and savings. However, the analysis uncovered an unexpected positive correlation between these factors. This unexpected finding suggests a nuanced behavior among individuals in response to CPI changes. Rather than following the anticipated trend of reducing savings during low CPI rates, the data implies a contrary behavior. People seem to respond more vigorously to CPI changes by increasing their savings and potentially limiting their spending. This unusual reaction indicates a cautious approach where individuals, even in times of low CPI, choose to save more as a response to perceived economic fluctuations.
\(\color{darkblue}{\text{6.3. Research question:}}\) \(\color{darkblue}{\text{Consumer spendings and}}\) \(\color{darkblue}{\text{differences}}\) \(\color{darkblue}{\text{of behavior between}}\) \(\color{darkblue}{\text{generations}}\)
The input variables for all the models are the following: Fed interest rate + CPI + Unemployment + Mortgage rate + Financial satisfaction survey answer 1. The coefficients are displayed in the table below.
Code
#For period 1972 - 1974
Q1_spending_model_1972_input <- Input_linear_model_72
Q1_spending_model_1972_input_scaled_data <- NULL
Q1_spending_model_1972_input_scaled_data <- as.data.frame(lapply(Q1_spending_model_1972_input[-1], scale_0_to_1))
Q1_spending_model_1972_input_scaled_data$personal_spending <- Consumption_expenditures_1972_1974$Personal_Consumption
Q1_spending_model_1972 <- lm( personal_spending ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_1972_1974_v1_satfin, data = Q1_spending_model_1972_input_scaled_data)
#For period 1977 - 1981
Q1_spending_model_1977_input <- Input_linear_model_77
Q1_spending_model_1977_input_scaled_data <- NULL
Q1_spending_model_1977_input_scaled_data <- as.data.frame(lapply(Q1_spending_model_1977_input[-1], scale_0_to_1))
Q1_spending_model_1977_input_scaled_data$personal_spending <- Consumption_expenditures_1977_1981$Personal_Consumption
Q1_spending_model_1977 <- lm( personal_spending ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_1977_1981_v1_satfin, data = Q1_spending_model_1977_input_scaled_data)
#For period 1993 - 1995
Q1_spending_model_1993_input <- Input_linear_model_93
Q1_spending_model_1993_input_scaled_data <- NULL
Q1_spending_model_1993_input_scaled_data <- as.data.frame(lapply(Q1_spending_model_1993_input[-1], scale_0_to_1))
Q1_spending_model_1993_input_scaled_data$personal_spending <- Consumption_expenditures_1993_1995$Personal_Consumption
Q1_spending_model_1993 <- lm( personal_spending ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_1993_1995_v1_satfin, data = Q1_spending_model_1993_input_scaled_data)
#For period 2003 - 2007
Q1_spending_model_2003_input <- Input_linear_model_2003
Q1_spending_model_2003_input_scaled_data <- NULL
Q1_spending_model_2003_input_scaled_data <- as.data.frame(lapply(Q1_spending_model_2003_input[-1], scale_0_to_1))
Q1_spending_model_2003_input_scaled_data$personal_spending <- Consumption_expenditures_2003_2007$Personal_Consumption
Q1_spending_model_2003 <- lm( personal_spending ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_2003_2007_v1_satfin, data = Q1_spending_model_2003_input_scaled_data)
#For period 2021 - 2023
Q1_spending_model_2021_input <- Input_linear_model_2021
Q1_spending_model_2021_input_scaled_data <- NULL
Q1_spending_model_2021_input_scaled_data <- as.data.frame(lapply(Q1_spending_model_2021_input[-1], scale_0_to_1))
Q1_spending_model_2021_input_scaled_data$personal_spending <- Consumption_expenditures_2021_2023$Personal_Consumption
Q1_spending_model_2021 <- lm( personal_spending ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Average_Value + values_change_2021_2023_v1_satfin, data = Q1_spending_model_2021_input_scaled_data)
personal_spending_coef <- data.frame(Q1_spending_model_2021$coefficients)
personal_spending_coef$year2003 <- Q1_spending_model_2003$coefficients
personal_spending_coef$year1993 <- Q1_spending_model_1993$coefficients
personal_spending_coef$year1977 <- Q1_spending_model_1977$coefficients
personal_spending_coef$year1972 <- Q1_spending_model_1972$coefficients
new_column_names_spending_coef <- c("2021", "2003", "1993", "1977", "1972")
colnames(personal_spending_coef) <- new_column_names_spending_coef
kable(personal_spending_coef)| 2021 | 2003 | 1993 | 1977 | 1972 | |
|---|---|---|---|---|---|
| (Intercept) | 16801.3 | 8568.03 | 4527.48 | 1211.7 | 766.91 |
| Effective_Rate | 149.5 | 822.25 | 95.98 | 92.4 | 8.66 |
| Flexible_CPI_.monthly. | 56.9 | 41.48 | -1.44 | 18.2 | -3.78 |
| Unemployment_rate | -240.5 | -237.08 | -5.17 | 28.6 | 1.15 |
| Average_Value | 334.5 | -6.65 | -29.48 | -72.9 | -4.99 |
| values_change_2021_2023_v1_satfin | 1478.3 | -516.10 | 308.61 | 590.6 | 180.79 |
\(\color{darkblue}{\text{6.3.1. Comparison of}}\) \(\color{darkblue}{\text{Coefficients}}\)
The coefficients of each year can be seen in below table.
Code
df_transposed_spending <- t(personal_spending_coef)
df_long_spending <- as.data.frame(df_transposed_spending, check.names = FALSE)
df_long_spending$Variable <- rownames(df_transposed_spending)
# Melt the data frame to long format for plotting
df_long_spending <- reshape2::melt(df_long_spending, id.vars = "Variable")
spending_unemp_coef <- df_long_spending[16:20,]
chart_unemp_q1spending <- plot_ly(data = spending_unemp_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
spending_satfin_coef <- df_long_spending[26:30,]
chart_satfin_q1spending <- plot_ly(data = spending_satfin_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
spending_fed_coef <- df_long_spending[6:10,]
chart_fed_q1spending <- plot_ly(data = spending_fed_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
spending_cpi_coef <- df_long_spending[11:15,]
chart_cpi_q1spending <- plot_ly(data = spending_cpi_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
subplot(chart_fed_q1spending, chart_cpi_q1spending,chart_unemp_q1spending,chart_satfin_q1spending , nrows = 2) |> layout(title = "Coefficients Across Years",legend = list(orientation = "h", x = 0.5, y = -0.15))
Effective rate impact on spending: The examination of effective rates reveals an intriguing correlation with consumer spending over time. Notably, there’s a consistent rise in this correlation, signifying that as effective rates increase, consumer spending tends to follow suit. However, a notable peak in 2003 deviates from this consistent pattern, suggesting an anomaly possibly influenced by external factors. Nonetheless, the comprehensive trend indicates a sustained impact of inflation-driven effective rate changes on consumption habits.
Unemployment’s shifting influence: Prior to 2003, unemployment exhibited minimal effect on consumer spending behaviors. However, a significant shift emerged post-2003, with strong negative correlations observed in both 2003 and 2021. This alteration suggests a changing trend where individuals seem to rely more on their regular income without resorting to utilizing their savings unless judged necessary. Such behavior might stem from pessimistic anticipations regarding future economic conditions.
Income satisfaction and spending behavior: An intriguing pattern arises regarding income satisfaction and its relationship with spending. There’s a discernible trend which indicates that higher levels of income satisfaction coincide with increased spending behaviors. This correlation reflects a tendency for individuals to engage in higher consumption when their income levels are considered satisfactory, which showcases a direct relationship between income contentment and expenditure habits.
These observations from the analysis of coefficients and their correlations offer valuable insights into how economic variables interact with consumer spending habits over various periods. The shifting correlations suggest evolving consumer behaviors influenced by factors such as inflation, employment outlook, and income satisfaction, reflecting dynamic trends in spending patterns over time.
\(\color{darkblue}{\text{6.4. Research question:}}\) \(\color{darkblue}{\text{Mental health and}}\) \(\color{darkblue}{\text{differences}}\) \(\color{darkblue}{\text{of behavior between}}\) \(\color{darkblue}{\text{generations}}\)
The input variables for all the models are the following: Effective Rate +Flexible CPI monthly + Unemployment rate + Home Price Index + Average Value + Income yes responds. The coefficients are displayed in the below table.
Code
#For period 2003 - 2007
Q1_mental_model_2003_input <- Input_linear_model_2003
Q1_mental_model_2003_input_scaled_data <- Q1_mental_model_2003_input
Q1_mental_model_2003_input_scaled_data <- as.data.frame(lapply(Q1_mental_model_2003_input[-1], scale_0_to_1))
Q1_mental_model_2003_input_scaled_data$mental <- mental_2003_2007_lminput$values_change_2003_2007_v1_mental
Q1_mental_model_2003 <- lm( mental ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Home_Price_Index + Average_Value + Income_yes_responds, data = Q1_mental_model_2003_input_scaled_data)
#For period 2021 - 2023
Q1_mental_model_2021_input <- Input_linear_model_2021
Q1_mental_model_2021_input_scaled_data <- Q1_mental_model_2021_input
Q1_mental_model_2021_input_scaled_data <- as.data.frame(lapply(Q1_mental_model_2021_input[-1], scale_0_to_1))
Q1_mental_model_2021_input_scaled_data$mental <- mental_2021_2023_lminput$values_change_2021_2023_v1_mental
Q1_mental_model_2021 <- lm( mental ~ Effective_Rate +Flexible_CPI_.monthly. + Unemployment_rate + Home_Price_Index + Average_Value + Income_yes_responds, data = Q1_mental_model_2021_input_scaled_data)
mental_coef <- data.frame(Q1_mental_model_2021$coefficients)
mental_coef$year2003 <- Q1_mental_model_2003$coefficients
new_column_names_mental_coef <- c("2021", "2003")
colnames(mental_coef) <- new_column_names_mental_coef
kable(mental_coef)| 2021 | 2003 | |
|---|---|---|
| (Intercept) | 4.497 | 3.329 |
| Effective_Rate | -0.031 | 0.014 |
| Flexible_CPI_.monthly. | 0.002 | 0.000 |
| Unemployment_rate | 0.013 | 0.000 |
| Home_Price_Index | -0.016 | -0.025 |
| Average_Value | -0.008 | 0.001 |
| Income_yes_responds | 0.520 | 1.156 |
\(\color{darkblue}{\text{6.4.1. Comparison of}}\) \(\color{darkblue}{\text{Coefficients}}\)
The coefficients of each year can be seen in below table.
Code
df_transposed_mental <- t(mental_coef)
df_long_mental <- as.data.frame(df_transposed_mental, check.names = FALSE)
df_long_mental$Variable <- rownames(df_transposed_mental)
# Melt the data frame to long format for plotting
df_long_mental <- reshape2::melt(df_long_mental, id.vars = "Variable")
mental_unemp_coef <- df_long_mental[7:8,]
chart_unemp_q1mental <- plot_ly(data = mental_unemp_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
mental_income_coef <- df_long_mental[13:14,]
chart_income_q1mental <- plot_ly(data = mental_income_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
mental_fed_coef <- df_long_mental[3:4,]
chart_fed_q1mental <- plot_ly(data = mental_fed_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
mental_cpi_coef <- df_long_mental[5:6,]
chart_cpi_q1mental <- plot_ly(data = mental_cpi_coef, x = ~Variable, y = ~value, type = 'bar', color = ~variable, hoverinfo = 'text',
text = ~paste(variable, ": ", value))
subplot( chart_cpi_q1mental,chart_unemp_q1mental,chart_income_q1mental , nrows = 2) |> layout(title = "Coefficients Across Years",legend = list(orientation = "h", x = 0.5, y = -0.15))
Unemployment’s impact on mental well-being: The examination of unemployment’s impact on mental health reveals a significant shift between the two available years (2003 and 2021). Notably, there’s a severe increase in the effect of unemployment on mental health in the more recent year (2021). This heightened correlation suggests a greater concern among individuals regarding their mental well-being when facing unemployment, likely attributed to increased financial pressures and associated hardships.
Federal interest rates and inflation concerns: While the trend in federal interest rates exhibits a negative trajectory, the noteworthy increase in the CPI (Consumer Price Index) coefficient demands attention. This change implies that individuals are placing greater emphasis on inflation as it relates to their mental well-being. The parallel rise in both these factors underscores a growing sensitivity to economic indicators such as inflation, highlighting people’s heightened concerns about the cost of living and its impact on mental health.
Income dynamics and emotional response: An intriguing finding emerges regarding the relationship between income changes and mental health. When comparing the years 2003 and 2021, there’s a noticeable shift in emotional response related to income increases. Contrary to expectations, individuals appear to express reduced satisfaction or happiness with increased income levels in the more recent period (2021). This unexpected response may indicate a sense of apprehension about future economic conditions, reflecting a potential pessimistic outlook.
Given the limited dataset covering only two years, a comprehensive analysis of behavioral shifts becomes challenging. However, even within this confined scope, distinct changes are discernible. The growing concern surrounding unemployment’s influence on mental health, coupled with the intriguing dynamics observed in response to income changes and inflationary concerns, hints at a growing sensitivity towards economic factors impacting mental well-being. These findings underscore the evolving complexities in how economic variables connect with individuals’ mental health perceptions and emotional states.
\(\color{darkblue}{\text{7. Discussion}}\)
\(\color{darkblue}{\text{7.1. Conclusion}}\)
Our analysis reveals the multifaceted impact of interest rates on consumer behavior. Interest rates influence not only savings and home buying habits but also consumer spending and sentiment, as evidenced across distinct periods. The traditional relationship between interest rates and savings is evolving, with periods showing unexpected increases in savings despite rate hikes. Similarly, the effect of interest rates on home buying behavior varies, emphasizing the influence of economic conditions on personal financial decisions.
Moreover, the correlation between effective rates and consumer spending remains generally positive, excluding anomalies. Changes in consumer sentiment, evident through surveys, reflect the dynamic relationship between economic variables and sentiment shifts. Lastly, noticeable differences in behaviors over time and between generations underscore the evolving nature of consumer responses to economic factors.
Regarding the impact of interest rates on consumer behavior, the study aligns with the anticipated inverse relationship between higher interest rates and reduced savings, yet it reveals anomalies like the unexpected positive correlation between savings and CPI changes, which defies the conventional economic principles.
Examining consumer spending behaviors in relation to interest rates and other economic indicators confirms prevailing expectations, revealing a consistent rise in correlation between effective rates and spending, with deviations in 2003 and 2021 which suggest external influences. Furthermore, the interaction between income satisfaction and spending behaviors showcases the psychological aspects influencing financial decisions.
Concerning mental health, the analysis supports the hypothesis that heightened unemployment correlates with increased mental health concerns, emphasizing the significance of economic stability in shaping emotional well-being. Additionally, the surprising correlation between the increased income and the reduced emotional satisfaction in 2021 underscores a potential shift towards pessimistic economic outlooks influencing individuals’ emotional states.
\(\color{darkblue}{\text{7.2. Limitations}}\)
Through our analysis, a few limitations have been discovered:
· Temporal and dataset constraints: the study’s time and data restrictions limit its ability to explore long-term trends and broader economic changes, which affects our understanding of how consumer behaviors and sentiment evolve over time.
· Geographical focus: this research was centered on the United States, and it might not capture variations in consumer behaviors and mental health influenced by diverse global economic and cultural contexts.
· Causality and correlation: even though the analysis reveals correlations between economic variables and behaviors, establishing causal relationships requires more thorough methodologies. Factors beyond the scope of this study might influence the observed correlations, which means that caution should be taken when attributing causality solely to the analyzed variables.
· Limited survey periods: the mental health analysis is restricted to two years, which could overlook any long-term mental health patterns, and could potentially hide some nuances or cyclical trends.
\(\color{darkblue}{\text{7.3. Proposition for}}\) \(\color{darkblue}{\text{future work}}\)
A few points could aim to expand the scope of the research that has been conducted in this report and incorporate other methodologies or explore new dimensions:
· Cross-cultural studies: global datasets and cross-cultural analyses could be incorporated to explore how economic indicators influence consumer behaviors and mental health across diverse socio-cultural contexts.
· Behavioral economics approach: principles from behavioral economics could be applied to investigate the psychological drivers that influence the financial decisions during economic changes. This could offer insights into the decision-making processes and non-rational factors that impact economic behaviors.
· Technological and digital trends: analyzing how the many technological advancements shape economic decisions could reveal novel trends.
· Dynamic modeling and predictive analysis: employing advanced modeling techniques could offer predictive insights into future consumer behaviors based on economic indicators.
\(\color{darkblue}{\text{7.4. References}}\)
· Case, K. E., & Shiller, R. J. (1989). The efficiency of the market for single-family homes. The American Economic Review, 79(1), 125-137.
· Friedman, B. M. (1982). Monetary policy effects on real interest rates. Journal of Political Economy, 90(2), 199-217.
· Curtin, R. T. (2001). Consumer sentiment, the economy, and the news media. Journal of Economic Psychology, 22(1), 43-62.
· Zarnowitz, V., & Lambros, L. (1987). Consensus and uncertainty in economic prediction. Journal of Political Economy, 95(3), 591-621.